Hamilton’s quaternion number system {\mathbb{H}} is a non-commutative extension of the complex numbers, consisting of numbers of the form {t + xi + yj + zk} where {t,x,y,z} are real numbers, and {i,j,k} are anti-commuting square roots of {-1} with {ij=k}, {jk=i}, {ki=j}. While they are non-commutative, they do keep many other properties of the complex numbers:

  • Being non-commutative, the quaternions do not form a field. However, they are still a skew field (or division ring): multiplication is associative, and every non-zero quaternion has a unique multiplicative inverse.
  • Like the complex numbers, the quaternions have a conjugation

    \displaystyle  \overline{t+xi+yj+zk} := t-xi-yj-zk,

    although this is now an antihomomorphism rather than a homomorphism: {\overline{qr} = \overline{r}\ \overline{q}}. One can then split up a quaternion {t + xi + yj + zk} into its real part {t} and imaginary part {xi+yj+zk} by the familiar formulae

    \displaystyle  \mathrm{Re} q := \frac{q + \overline{q}}{2}; \quad \mathrm{Im} q := \frac{q - \overline{q}}{2}

    (though we now leave the imaginary part purely imaginary, as opposed to dividing by {i} in the complex case).
  • The inner product

    \displaystyle  \langle q, r \rangle := \mathrm{Re} q \overline{r}

    is symmetric and positive definite (with {1,i,j,k} forming an orthonormal basis). Also, for any {q}, {q \overline{q}} is real, hence equal to {\langle q, q \rangle}. Thus we have a norm

    \displaystyle  |q| = \sqrt{q\overline{q}} = \sqrt{\langle q,q \rangle} = \sqrt{t^2 + x^2 + y^2 + z^2}.

    Since the real numbers commute with all quaternions, we have the multiplicative property {|qr| = |q| |r|}. In particular, the unit quaternions {U(1,\mathbb{H}) := \{ q \in \mathbb{H}: |q|=1\}} (also known as {SU(2)}, {Sp(1)}, or {Spin(3)}) form a compact group.
  • We have the cyclic trace property

    \displaystyle  \mathrm{Re}(qr) = \mathrm{Re}(rq)

    which allows one to take adjoints of left and right multiplication:

    \displaystyle  \langle qr, s \rangle = \langle q, s\overline{r}\rangle; \quad \langle rq, s \rangle = \langle q, \overline{r}s \rangle

  • As {i,j,k} are square roots of {-1}, we have the usual Euler formulae

    \displaystyle  e^{i\theta} = \cos \theta + i \sin \theta, e^{j\theta} = \cos \theta + j \sin \theta, e^{k\theta} = \cos \theta + k \sin \theta

    for real {\theta}, together with other familiar formulae such as {\overline{e^{i\theta}} = e^{-i\theta}}, {e^{i(\alpha+\beta)} = e^{i\alpha} e^{i\beta}}, {|e^{i\theta}| = 1}, etc.
We will use these sorts of algebraic manipulations in the sequel without further comment.

The unit quaternions {U(1,\mathbb{H}) = \{ q \in \mathbb{H}: |q|=1\}} act on the imaginary quaternions {\{ xi + yj + zk: x,y,z \in {\bf R}\} \equiv {\bf R}^3} by conjugation:

\displaystyle  v \mapsto q v \overline{q}.

This action is by orientation-preserving isometries, hence by rotations. It is not quite faithful, since conjugation by the unit quaternion {-1} is the identity, but one can show that this is the only loss of faithfulness, reflecting the well known fact that {U(1,\mathbb{H}) \equiv SU(2)} is a double cover of {SO(3)}.

For instance, for any real {\theta}, conjugation by {e^{i\theta/2} = \cos(\theta/2) + i \sin(\theta/2)} is a rotation by {\theta} around {i}:

\displaystyle  e^{i\theta/2} i e^{-i\theta/2} = i \ \ \ \ \ (1)

\displaystyle  e^{i\theta/2} j e^{-i\theta/2} = \cos(\theta) j - \sin(\theta) k \ \ \ \ \ (2)

\displaystyle  e^{i\theta/2} k e^{-i\theta/2} = \cos(\theta) k + \sin(\theta) j. \ \ \ \ \ (3)

Similarly for cyclic permutations of {i,j,k}. The doubling of the angle here can be explained from the Lie algebra fact that {[i,j]=ij-ji} is {2k} rather than {k}; it also closely related to the aforementioned double cover. We also of course have {U(1,\mathbb{H})\equiv Spin(3)} acting on {\mathbb{H}} by left multiplication; this is known as the spinor representation, but will not be utilized much in this post. (Giving {\mathbb{H}} the right action of {{\bf C}} makes it a copy of {{\bf C}^2}, and the spinor representation then also becomes the standard representation of {SU(2)} on {{\bf C}^2}.)

Given how quaternions relate to three-dimensional rotations, it is not surprising that one can also be used to recover the basic laws of spherical trigonometry – the study of spherical triangles on the unit sphere. This is fairly well known, but it took a little effort for me to locate the required arguments, so I am recording the calculations here.

The first observation is that every unit quaternion {q} induces a unit tangent vector {qj\overline{q}} on the unit sphere {S^2 \subset {\bf R}^3}, located at {qi\overline{q} \in S^2}; the third unit vector {qk\overline{q}} is then another tangent vector orthogonal to the first two (and oriented to the left of the original tangent vector), and can be viewed as the cross product of {qi\overline{q} \in S^2} and {qj\overline{q} \in S^2}. Right multplication of this quaternion then corresponds to various natural operations on this unit tangent vector:

  • Right multiplying {q} by {e^{i\theta/2}} does not affect the location {qi\overline{q}} of the tangent vector, but rotates the tangent vector {qj\overline{q}} anticlockwise by {\theta} in the direction of the orthogonal tangent vector {qk\overline{q}}, as it replaces {qj\overline{q}} by {\cos(\theta) qj\overline{q} + \sin(\theta) qk\overline{q}}.
  • Right multiplying {q} by {e^{k\theta/2}} advances the tangent vector by geodesic flow by angle {\theta}, as it replaces {qi\overline{q}} by {\cos(\theta) qi\overline{q} + \sin(\theta) qj\overline{q}}, and replaces {qj\overline{q}} by {\cos(\theta) qj\overline{q} - \sin(\theta) qi\overline{q}}.

Now suppose one has a spherical triangle with vertices {A,B,C}, with the spherical arcs {AB, BC, CA} subtending angles {c, a, b} respectively, and the vertices {A,B,C} subtending angles {\alpha,\beta,\gamma} respectively; suppose also that {ABC} is oriented in an anti-clockwise direction for sake of discussion. Observe that if one starts at {A} with a tangent vector oriented towards {B}, advances that vector by {c}, and then rotates by {\pi - \beta}, the tangent vector now at {B} and pointing towards {C}. If one advances by {a} and rotates by {\pi - \gamma}, one is now at {C} pointing towards {A}; and if one then advances by {b} and rotates by {\pi - \alpha}, one is back at {A} pointing towards {B}. This gives the fundamental relation

\displaystyle  e^{kc/2} e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} e^{kb/2} e^{i(\pi-\alpha)/2} = 1 \ \ \ \ \ (4)

relating the three sides and three equations of this triangle. (A priori, due to the lack of faithfulness of the {U(1,\mathbb{H})} action, the right-hand side could conceivably have been {-1} rather than {1}; but for extremely small triangles the right-hand side is clearly {1}, and so by continuity it must be {1} for all triangles.) Indeed, a moments thought will reveal that the condition (4) is necessary and sufficient for the data {a,b,c,\alpha,\beta,\gamma} to be associated with a spherical triangle. Thus one can view (4) as a “master equation” for spherical trigonometry: in principle, it can be used to derive all the other laws of this subject.

Remark 1 The law (4) has an evident symmetry {(a,b,c,\alpha,\beta,\gamma) \mapsto (\pi-\alpha,\pi-\beta,\pi-\gamma,\pi-a,\pi-b,\pi-c)}, which corresponds to the operation of replacing a spherical triangle with its dual triangle. Also, there is nothing particularly special about the choice of imaginaries {i,k} in (4); one can conjugate (4) by various quaternions and replace {i,k} here by any other orthogonal pair of unit quaternions.

Remark 2 If we work in the small scale regime, replacing {a,b,c} by {\varepsilon a, \varepsilon b, \varepsilon c} for some small {\varepsilon>0}, then we expect spherical triangles to behave like Euclidean triangles. Indeed, (4) to zeroth order becomes

\displaystyle  e^{i(\pi-\beta)/2} e^{i(\pi-\gamma)/2} e^{i(\pi-\alpha)/2} = 1

which reflects the classical fact that the sum of angles of a Euclidean triangle is equal to {\pi}. To first order, one obtains

\displaystyle  c + a e^{i(\pi-\gamma)/2} e^{i(\pi-\alpha)/2} + b e^{i(\pi-\alpha)/2} = 0

which reflects the evident fact that the vector sum of the sides of a Euclidean triangle sum to zero. (Geometrically, this correspondence reflects the fact that the action of the (projective) quaternion group on the unit sphere converges to the action of the special Euclidean group {SE(2)} on the plane, in a suitable asymptotic limit.)

The identity (4) is an identity of two unit quaternions; as the unit quaternion group {U(1,\mathbb{H})} is three-dimensional, this thus imposes three independent constraints on the six real parameters {a,b,c,\alpha,\beta,\gamma} of the spherical triangle. One can manipulate this constraint in various ways to obtain various trigonometric identities involving some subsets of these six parameters. For instance, one can rearrange (4) to get

\displaystyle  e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} = e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2}. \ \ \ \ \ (5)

Conjugating by {i} to reverse the sign of {k}, we also have

\displaystyle  e^{i(\pi-\beta)/2} e^{-ka/2} e^{i(\pi-\gamma)/2} = e^{kc/2} e^{-i(\pi-\alpha)/2} e^{kb/2}.

Taking the inner product of both sides of these identities, we conclude that

\displaystyle  \langle e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2}, e^{i(\pi-\beta)/2} e^{-ka/2} e^{i(\pi-\gamma)/2} \rangle

is equal to

\displaystyle  \langle e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2}, e^{kc/2} e^{-i(\pi-\alpha)/2} e^{kb/2} \rangle.

Using the various properties of inner product, the former expression simplifies to {\mathrm{Re} e^{ka} = \cos a}, while the latter simplifies to

\displaystyle  \mathrm{Re} \langle e^{-i(\pi-\alpha)/2} e^{-kb} e^{i(\pi-\alpha)/2}, e^{kc} \rangle.

We can write {e^{kc} = \cos c + (\sin c) k} and

\displaystyle  e^{-i(\pi-\alpha)/2} e^{-kb} e^{i(\pi-\alpha)/2} = \cos b - (\sin b) (\cos(\pi-\alpha) k + \sin(\pi-\alpha) j)

so on substituting and simplifying we obtain

\displaystyle  \cos b \cos c + \sin b \sin c \cos \alpha = \cos a

which is the spherical cosine rule. Note in the infinitesimal limit (replacing {a,b,c} by {\varepsilon a, \varepsilon b, \varepsilon c}) this rule becomes the familiar Euclidean cosine rule

\displaystyle  c^2 = a^2 + b^2 - 2ab \cos \alpha.

In a similar fashion, from (5) we see that the quantity

\displaystyle  \langle e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} i e^{-i(\pi-\gamma)/2} e^{-ka/2} e^{-i(\pi-\beta)/2}, k \rangle

is equal to

\displaystyle  \langle e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2} e^{kc/2}, k \rangle.

The first expression simplifies by (1) and properties of the inner product to

\displaystyle  \langle e^{ka/2} i e^{-ka/2}, e^{-i(\pi-\beta)/2} k e^{i(\pi-\beta)/2} \rangle,

which by (2), (3) simplifies further to {-\sin a \sin \beta}. Similarly, the second expression simplifies to

\displaystyle  \langle e^{-kb/2} i e^{kb/2} , e^{i(\pi-\alpha)/2} k e^{-i(\pi-\alpha)/2}\rangle,

which by (2), (3) simplifies to {-\sin b \sin \alpha}. Equating the two and rearranging, we obtain

\displaystyle  \frac{\sin \alpha}{\sin a} = \frac{\sin \beta}{\sin b}

which is the spherical sine rule. Again, in the infinitesimal limit we obtain the familiar Euclidean sine rule

\displaystyle  \frac{\sin \alpha}{a} = \frac{\sin \beta}{b}.

As a variant of the above analysis, we have from (5) again that

\displaystyle  \langle e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} i e^{-i(\pi-\gamma)/2} e^{-ka/2} e^{-i(\pi-\beta)/2}, j \rangle

is equal to

\displaystyle  \langle e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2} e^{kc/2}, j \rangle.

As before, the first expression simplifies to

\displaystyle  \langle e^{ka/2} i e^{-ka/2}, e^{-i(\pi-\beta)/2} j e^{i(\pi-\beta)/2} \rangle

which equals {\sin a \cos \beta}. Meanwhile, the second expression can be rearranged as

\displaystyle  \langle e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2}, e^{kc/2} j e^{-kc/2} \rangle.

By (2), (3) we can simplify to

\displaystyle  e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2}

\displaystyle= (\cos b) i - (\sin b) \cos(\pi-\alpha) j + (\sin b) \sin(\pi-\alpha) k

and so the inner product is {\cos b \sin c - \cos b \sin c \cos \alpha}, leading to the “five part rule

\displaystyle  \cos b \sin c - \sin b \cos c \cos \alpha = \sin a \cos \beta.

In the case of a right-angled triangle {\beta=\pi/2}, this simplifies to one of Napier’s rules

\displaystyle  \cos \alpha = \frac{\tan c}{\tan b}, \ \ \ \ \ (6)

which in the infinitesimal limit is the familiar {\cos \alpha = \frac{c}{b}}. The other rules of Napier can be derived in a similar fashion.

Example 3 One application of Napier’s rule (6) is to determine the sunrise equation for when the sun rises and sets at a given location on the Earth, and a given time of year. For sake of argument let us work in summer, in which the declination {\delta} of the Sun is positive (due to axial tilt, it reaches a maximum of {23.5^\circ} at the summer solstice). Then the Sun subtends an angle of {\pi/2-\delta} from the pole star (Polaris in the northern hemisphere, Sigma Octantis in the southern hemisphere), and appears to rotate around that pole star once every {24} hours. On the other hand, if one is at a latitude {\phi}, then the pole star an elevation of {\phi} above the horizon. At extremely high latitudes {\phi > \pi/2-\delta}, the sun will never set (a phenomenon known as “midnight sun“); but in all other cases, at sunrise or sunset, the sun, pole star, and horizon point below the pole star will form a right-angled spherical triangle, with hypotenuse subtending an angle {\pi/2-\delta} and vertical side subtending an angle {\phi}. The angle subtended by the pole star in this triangle is {\pi-\omega}, where {\omega} is the solar hour angle {\omega} – the angle that the sun deviates from its noon position. Equation (6) then gives the sunrise equation

\displaystyle  \cos(\pi-\omega) = \frac{\tan \phi}{\tan(\pi/2-\delta)}

or equivalently

\displaystyle  \cos \omega = - \tan \phi \tan \delta.

A similar rule determines the time of sunset. In particular, the number of daylight hours in summer (assuming one is not in the midnight sun scenario {\phi > \pi/2 -\delta}) is given by

\displaystyle  24 - \frac{24}{\pi} \mathrm{arccos}(\tan \phi \tan \delta).

The situation in winter is similar, except that {\delta} is now negative, and polar night (no sunrise) occurs when {\phi > \pi/2+\delta}.

I’ve just uploaded to the arXiv the paper “On the distribution of eigenvalues of GUE and its minors at fixed index“. This is a somewhat technical paper establishing some estimates regarding one of the most well-studied random matrix models, the Gaussian Unitary Ensemble (GUE), that were not previously in the literature, but which will be needed for some forthcoming work of Hariharan Narayanan on the limiting behavior of “hives” with GUE boundary conditions (building upon our previous joint work with Sheffield).

For sake of discussion we normalize the GUE model to be the random {N \times N} Hermitian matrix {H} whose probability density function is proportional to {e^{-\mathrm{tr} H^2}}. With this normalization, the famous Wigner semicircle law will tell us that the eigenvalues {\lambda_1 \leq \dots \leq \lambda_N} of this matrix will almost all lie in the interval {[-\sqrt{2N}, \sqrt{2N}]}, and after dividing by {\sqrt{2N}}, will asymptotically be distributed according to the semicircle distribution

\displaystyle  \rho_{\mathrm{sc}}(x) := \frac{2}{\pi} (1-x^2)_+^{1/2}.

In particular, the normalized {i^{th}} eigenvalue {\lambda_i/\sqrt{2N}} should be close to the classical location {\gamma_{i/N}}, where {\gamma_{i/N}} is the unique element of {[-1,1]} such that

\displaystyle  \int_{-\infty}^{\gamma_{i/N}} \rho_{\mathrm{sc}}(x)\ dx = \frac{i}{N}.

Eigenvalues can be described by their index {i} or by their (normalized) energy {\lambda_i/\sqrt{2N}}. In principle, the two descriptions are related by the classical map {i \mapsto \gamma_{i/N}} defined above, but there are microscopic fluctuations from the classical location that create subtle technical difficulties between “fixed index” results in which one focuses on a single index {i} (and neighboring indices {i+1, i-1}, etc.), and “fixed energy” results in which one focuses on a single energy {x} (and eigenvalues near this energy). The phenomenon of eigenvalue rigidity does give some control on these fluctuations, allowing one to relate “averaged index” results (in which the index {i} ranges over a mesoscopic range) with “averaged energy” results (in which the energy {x} is similarly averaged over a mesoscopic interval), but there are technical issues in passing back from averaged control to pointwise control, either for the index or energy.

We will be mostly concerned in the bulk region where the index {i} is in an inteval of the form {[\delta n, (1-\delta)n]} for some fixed {\delta>0}, or equivalently the energy {x} is in {[-1+c, 1-c]} for some fixed {c > 0}. In this region it is natural to introduce the normalized eigenvalue gaps

\displaystyle  g_i := \sqrt{N/2} \rho_{\mathrm{sc}}(\gamma_{i/N}) (\lambda_{i+1} - \lambda_i).

The semicircle law predicts that these gaps {g_i} have mean close to {1}; however, due to the aforementioned fluctuations around the classical location, this type of claim is only easy to establish in the “fixed energy”, “averaged energy”, or “averaged index” settings; the “fixed index” case was only achieved by myself as recently as 2013, where I showed that each such gap in fact asymptotically had the expected distribution of the Gaudin law, using manipulations of determinantal processes. A significantly more general result, avoiding the use of determinantal processes, was subsequently obtained by Erdos and Yau.

However, these results left open the possibility of bad tail behavior at extremely large or small values of the gaps {g_i}; in particular, moments of the {g_i} were not directly controlled by previous results. The first result of the paper is to push the determinantal analysis further, and obtain such results. For instance, we obtain moment bounds

\displaystyle  \mathop{\bf E} g_i^p \ll_p 1

for any fixed {p > 0}, as well as an exponential decay bound

\displaystyle  \mathop{\bf P} (g_i > h) \ll \exp(-h/4)

for {0 < h \ll \log\log N}, and a lower tail bound

\displaystyle  \mathop{\bf P} (g_i \leq h) \ll h^{2/3} \log^{1/2} \frac{1}{h}

for any {h>0}. We also obtain good control on sums {g_i + \dots + g_{i+m-1}} of {m} consecutive gaps for any fixed {m}, showing that this sum has mean {m + O(\log^{4/3} (2+m))} and variance {O(\log^{7/3} (2+m))}. (This is significantly less variance than one would expect from a sum of {m} independent random variables; this variance reduction phenomenon is closely related to the eigenvalue rigidity phenomenon alluded to earlier, and reflects the tendency of eigenvalues to repel each other.)

A key point in these estimates is that no factors of {\log N} occur in the estimates, which is what one would obtain if one tried to use existing eigenvalue rigidity theorems. (In particular, if one normalized the eigenvalues {\lambda_i} at the same scale at the gap {g_i}, they would fluctuate by a standard deviation of about {\sqrt{\log N}}; it is only the gaps between eigenvalues that exhibit much smaller fluctuation.) On the other hand, the dependence on {h} is not optimal, although it was sufficient for the applications I had in mind.

As with my previous paper, the strategy is to try to replace fixed index events such as {g_i > h} with averaged energy events. For instance, if {g_i > h} and {i} has classical location {x}, then there is an interval of normalized energies {t} of length {\gg h}, with the property that there are precisely {N-i} eigenvalues to the right of {f_x(t)} and no eigenvalues in the interval {[f_x(t), f_x(t+h/2)]}, where

\displaystyle  f_x(t) = \sqrt{2N}( x + \frac{t}{N \rho_{\mathrm{sc}}(x)})

is an affine rescaling to the scale of the eigenvalue gap. So matters soon reduce to controlling the probability of the event

\displaystyle  (N_{x,t} = N-i) \wedge (N_{x,t,h/2} = 0)

where {N_{x,t}} is the number of eigenvalues to the right of {f_x(t)}, and {N_{x,t,h/2}} is the number of eigenvalues in the interval {[f_x(t), f_x(t+h/2)]}. These are fixed energy events, and one can use the theory of determinantal processes to control them. For instance, each of the random variables {N_{x,t}}, {N_{x,t,h/2}} separately have the distribution of sums of independent Boolean variables, which are extremely well understood. Unfortunately, the coupling is a problem; conditioning on the event {N_{x,t} = N-i}, in particular, affects the distribution of {N_{x,t,h/2}}, so that it is no longer the sum of independent Boolean variables. However, it is still a mixture of such sums, and with this (and the Plancherel-Rotach asymptotics for the GUE determinantal kernel) it is possible to proceed and obtain the above estimates after some calculation.

For the intended application to GUE hives, it is important to not just control gaps {g_i} of the eigenvalues {\lambda_i} of the GUE matrix {M}, but also the gaps {g'_i} of the eigenvalues {\lambda'_i} of the top left {N-1 \times N-1} minor {M'} of {M}. This minor of a GUE matrix is basically again a GUE matrix, so the above theorem applies verbatim to the {g'_i}; but it turns out to be necessary to control the joint distribution of the {g_i} and {g'_i}, and also of the interlacing gaps {\tilde g_i} between the {\lambda_i} and {\lambda'_i}. For fixed energy, these gaps are in principle well understood, due to previous work of Adler-Nordenstam-van Moerbeke and of Johansson-Nordenstam which show that the spectrum of both matrices is asymptotically controlled by the Boutillier bead process. This also gives averaged energy and averaged index results without much difficulty, but to get to fixed index information, one needs some universality result in the index {i}. For the gaps {g_i} of the original matrix, such a universality result is available due to the aforementioned work of Erdos and Yau, but this does not immediately imply the corresponding universality result for the joint distribution of {g_i} and {g'_i} or {\tilde g_i}. For this, we need a way to relate the eigenvalues {\lambda_i} of the matrix {M} to the eigenvalues {\lambda'_i} of the minors {M'}. By a standard Schur’s complement calculation, one can obtain the equation

\displaystyle a_{NN} - \lambda_i - \sum_{j=1}^{N-1}\frac{|X_j|^2}{\lambda'_j - \lambda_i} = 0

for all {i}, where {a_{NN}} is the bottom right entry of {M}, and {X_1,\dots,X_{N-1}} are complex gaussians independent of {\lambda'_j}. This gives a random system of equations to solve for {\lambda_i} in terms of {\lambda'_j}. Using the previous bounds on eigenvalue gaps (particularly the concentration results for sums of consecutive gaps), one can localize this equation to the point where a given {\lambda_i} is mostly controlled by a bounded number of nearby {\lambda'_j}, and hence a single gap {g_i} is mostly controlled by a bounded number of {g'_j}. From this, it is possible to leverage the existing universality result of Erdos and Yau to obtain universality of the joint distribution of {g_i} and {g'_i} (or of {\tilde g_i}). (The result can also be extended to more layers of the minor process than just two, as long as the number of minors is held fixed.)

This at last brings us to the final result of the paper, which is the one which is actually needed for the application to GUE hives. Here, one is interested in controlling the variance of a linear combination {\sum_{l=1}^m a_l \tilde g_{i+l}} of a fixed number {l} of consecutive interlacing gaps {\tilde g_{i+l}}, where the {a_l} are arbitrary deterministic coefficients. An application of the triangle and Cauchy-Schwarz inequalities, combined with the previous moment bounds on gaps, shows that this randomv ariable has variance {\ll m \sum_{l=1}^m |a_i|^2}. However, this bound is not expected to be sharp, due to the expected decay between correlations of eigenvalue gaps. In this paper, I improve the variance bound to

\displaystyle  \ll_A \frac{m}{\log^A(2+m)} \sum_{l=1}^m |a_i|^2

for any {A>0}, which is what was needed for the application.

This improvement reflects some decay in the covariances between distant interlacing gaps {\tilde g_i, \tilde g_{i+h}}. I was not able to establish such decay directly. Instead, using some Fourier analysis, one can reduce matters to studying the case of modulated linear statistics such as {\sum_{l=1}^m e(\xi l) \tilde g_{i+l}} for various frequencies {\xi}. In “high frequency” cases one can use the triangle inequality to reduce matters to studying the original eigenvalue gaps {g_i}, which can be handled by a (somewhat complicated) determinantal process calculation, after first using universality results to pass from fixed index to averaged index, thence to averaged energy, then to fixed energy estimates. For low frequencies the triangle inequality argument is unfavorable, and one has to instead use the determinantal kernel of the full minor process, and not just an individual matrix. This requires some classical, but tedious, calculation of certain asymptotics of sums involving Hermite polynomials.

The full argument is unfortunately quite complex, but it seems that the combination of having to deal with minors, as well as fixed indices, places this result out of reach of many prior methods.

Renaissance Philanthropy and XTX Markets have announced the launch of the AI for Math Fund, a new grant program supporting projects that apply AI and machine learning to mathematics, with a focus on automated theorem proving, with an initial $9.2 million in funding. The project funding categories, and examples of projects in such categories, are:

1. Production Grade Software Tools

  • AI-based autoformalization tools for translating natural-language mathematics into the formalisms of proof assistants
  • AI-based auto-informalization tools for translating proof-assistant proofs into interpretable natural-language mathematics
  • AI-based models for suggesting tactics/steps or relevant concepts to the user of a proof assistant, or for generating entire proofs
  • Infrastructure to connect proof assistants with computer algebra systems, calculus, and PDEs
  • A large-scale, AI-enhanced distributed collaboration platform for mathematicians

2. Datasets

  • Datasets of formalized theorems and proofs in a proof assistant
  • Datasets that would advance AI for theorem proving as applied to program verification and secure code generation
  • Datasets of (natural-language) mathematical problems, theorems, proofs, exposition, etc.
  • Benchmarks and training environments associated with datasets and model tasks (autoformalization, premise selection, tactic or proof generation, etc.)

3. Field Building

  • Textbooks
  • Courses
  • Documentation and support for proof assistants, and interfaces/APIs to integrate with AI tools

4. Breakthrough Ideas

  • Expected difficulty estimation (of sub-problems of a proof)
  • Novel mathematical implications of proofs formalized type-theoretically
  • Formalization of proof complexity in proof assistants

The deadline for initial expressions of interest is Jan 10, 2025.

[Disclosure: I have agreed to serve on the advisory board for this fund.]

Update: See also this discussion thread on possible projects that might be supported by this fund.

Vjeko Kovac and I have just uploaded to the arXiv our paper “On several irrationality problems for Ahmes series“. This paper resolves (or at least makes partial progress on) some open questions of Erdős and others on the irrationality of Ahmes series, which are infinite series of the form {\sum_{k=1}^\infty \frac{1}{a_k}} for some increasing sequence {a_k} of natural numbers. Of course, since most real numbers are irrational, one expects such series to “generically” be irrational, and we make this intuition precise (in both a probabilistic sense and a Baire category sense) in our paper. However, it is often difficult to establish the irrationality of any specific series. For example, it is already a non-trivial result of Erdős that the series {\sum_{k=1}^\infty \frac{1}{2^k-1}} is irrational, while the irrationality of {\sum_{p \hbox{ prime}} \frac{1}{2^p-1}} (equivalent to Erdős problem #69) remains open, although very recently Pratt established this conditionally on the Hardy–Littlewood prime tuples conjecture. Finally, the irrationality of {\sum_n \frac{1}{n!-1}} (Erdős problem #68) is completely open.

On the other hand, it has long been known that if the sequence {a_k} grows faster than {C^{2^k}} for any {C}, then the Ahmes series is necessarily irrational, basically because the fractional parts of {a_1 \dots a_m \sum_{k=1}^\infty \frac{1}{a_k}} can be arbitrarily small positive quantities, which is inconsistent with {\sum_{k=1}^\infty \frac{1}{a_k}} being rational. This growth rate is sharp, as can be seen by iterating the identity {\frac{1}{n} = \frac{1}{n+1} + \frac{1}{n(n+1)}} to obtain a rational Ahmes series of growth rate {(C+o(1))^{2^k}} for any fixed {C>1}.

In our paper we show that if {a_k} grows somewhat slower than the above sequences in the sense that {a_{k+1} = o(a_k^2)}, for instance if {a_k \asymp 2^{(2-\varepsilon)^k}} for a fixed {0 < \varepsilon < 1}, then one can find a comparable sequence {b_k \asymp a_k} for which {\sum_{k=1}^\infty \frac{1}{b_k}} is rational. This partially addresses Erdős problem #263, which asked if the sequence {a_k = 2^{2^k}} had this property, and whether any sequence of exponential or slower growth (but with {\sum_{k=1}^\infty 1/a_k} convergent) had this property. Unfortunately we barely miss a full solution of both parts of the problem, since the condition {a_{k+1} = o(a_k^2)} we need just fails to cover the case {a_k = 2^{2^k}}, and also does not quite hold for all sequences going to infinity at an exponential or slower rate.

We also show the following variant; if {a_k} has exponential growth in the sense that {a_{k+1} = O(a_k)} with {\sum_{k=1}^\infty \frac{1}{a_k}} convergent, then there exists nearby natural numbers {b_k = a_k + O(1)} such that {\sum_{k=1}^\infty \frac{1}{b_k}} is rational. This answers the first part of Erdős problem #264 which asked about the case {a_k = 2^k}, although the second part (which asks about {a_k = k!}) is slightly out of reach of our methods. Indeed, we show that the exponential growth hypothesis is best possible in the sense a random sequence {a_k} that grows faster than exponentially will not have this property, this result does not address any specific superexponential sequence such as {a_k = k!}, although it does apply to some sequence {a_k} of the shape {a_k = k! + O(\log\log k)}.

Our methods can also handle higher dimensional variants in which multiple series are simultaneously set to be rational. Perhaps the most striking result is this: we can find an increasing sequence {a_k} of natural numbers with the property that {\sum_{k=1}^\infty \frac{1}{a_k + t}} is rational for every rational {t} (excluding the cases {t = - a_k} to avoid division by zero)! This answers (in the negative) a question of Stolarsky Erdős problem #266, and also reproves Erdős problem #265 (and in the latter case one can even make {a_k} grow double exponentially fast).

Our methods are elementary and avoid any number-theoretic considerations, relying primarily on the countable dense nature of the rationals and an iterative approximation technique. The first observation is that the task of representing a given number {q} as an Ahmes series {\sum_{k=1}^\infty \frac{1}{a_k}} with each {a_k} lying in some interval {I_k} (with the {I_k} disjoint, and going to infinity fast enough to ensure convergence of the series), is possible if and only if the infinite sumset

\displaystyle  \frac{1}{I_1} + \frac{1}{I_2} + \dots

to contain {q}, where {\frac{1}{I_k} = \{ \frac{1}{a}: a \in I_k \}}. More generally, to represent a tuple of numbers {(q_t)_{t \in T}} indexed by some set {T} of numbers simultaneously as {\sum_{k=1}^\infty \frac{1}{a_k+t}} with {a_k \in I_k}, this is the same as asking for the infinite sumset

\displaystyle  E_1 + E_2 + \dots

to contain {(q_t)_{t \in T}}, where now

\displaystyle  E_k = \{ (\frac{1}{a+t})_{t \in T}: a \in I_k \}. \ \ \ \ \ (1)

So the main problem is to get control on such infinite sumsets. Here we use a very simple observation:

Proposition 1 (Iterative approximation) Let {V} be a Banach space, let {E_1,E_2,\dots} be sets with each {E_k} contained in the ball of radius {\varepsilon_k>0} around the origin for some {\varepsilon_k} with {\sum_{k=1}^\infty \varepsilon_k} convergent, so that the infinite sumset {E_1 + E_2 + \dots} is well-defined. Suppose that one has some convergent series {\sum_{k=1}^\infty v_k} in {V}, and sets {B_1,B_2,\dots} converging in norm to zero, such that

\displaystyle  v_k + B_k \subset E_k + B_{k+1} \ \ \ \ \ (2)

for all {k \geq 1}. Then the infinite sumset {E_1 + E_2 + \dots} contains {\sum_{k=1}^\infty v_k + B_1}.

Informally, the condition (2) asserts that {E_k} occupies all of {v_k + B_k} “at the scale {B_{k+1}}“.

Proof: Let {w_1 \in B_1}. Our task is to express {\sum_{k=1}^\infty v_k + w_1} as a series {\sum_{k=1}^\infty e_k} with {e_k \in E_k}. From (2) we may write

\displaystyle  \sum_{k=1}^\infty v_k + w_1 = \sum_{k=2}^\infty v_k + e_1 + w_2

for some {e_1 \in E_1} and {w_2 \in B_2}. Iterating this, we may find {e_k \in E_k} and {w_k \in B_k} such that

\displaystyle  \sum_{k=1}^\infty v_k + w_1 = \sum_{k=m+1}^\infty v_k + e_1 + e_2 + \dots + e_m + w_{m+1}

for all {m}. Sending {m \rightarrow \infty}, we obtain

\displaystyle  \sum_{k=1}^\infty v_k + w_1 = e_1 + e_2 + \dots

as required. \Box

In one dimension, sets of the form {\frac{1}{I_k}} are dense enough that the condition (2) can be satisfied in a large number of situations, leading to most of our one-dimensional results. In higher dimension, the sets {E_k} lie on curves in a high-dimensional space, and so do not directly obey usable inclusions of the form (2); however, for suitable choices of intervals {I_k}, one can take some finite sums {E_{k+1} + \dots + E_{k+d}} which will become dense enough to obtain usable inclusions of the form (2) once {d} reaches the dimension of the ambient space, basically thanks to the inverse function theorem (and the non-vanishing curvatures of the curve in question). For the Stolarsky problem, which is an infinite-dimensional problem, it turns out that one can modify this approach by letting {d} grow slowly to infinity with {k}.

Kaisa Matomäki, Maksym Radziwill, Fernando Xuancheng Shao, Joni Teräväinen, and myself have (finally) uploaded to the arXiv our paper “Higher uniformity of arithmetic functions in short intervals II. Almost all intervals“. This is a sequel to our previous paper from 2022. In that paper, discorrelation estimates such as

\displaystyle  \sum_{x \leq n \leq x+H} (\Lambda(n) - \Lambda^\sharp(n)) \bar{F}(g(n)\Gamma) = o(H)

were established, where {\Lambda} is the von Mangoldt function, {\Lambda^\sharp} was some suitable approximant to that function, {F(g(n)\Gamma)} was a nilsequence, and {[x,x+H]} was a reasonably short interval in the sense that {H \sim x^{\theta+\varepsilon}} for some {0 < \theta < 1} and some small {\varepsilon>0}. In that paper, we were able to obtain non-trivial estimates for {\theta} as small as {5/8}, and for some other functions such as divisor functions {d_k} for small values of {k}, we could lower {\theta} somewhat to values such as {3/5}, {5/9}, {1/3} of {\theta}. This had a number of analytic number theory consequences, for instance in obtaining asymptotics for additive patterns in primes in such intervals. However, there were multiple obstructions to lowering {\theta} much further. Even for the model problem when {F(g(n)\Gamma) = 1}, that is to say the study of primes in short intervals, until recently the best value of {\theta} available was {7/12}, although this was very recently improved to {17/30} by Guth and Maynard.

However, the situation is better when one is willing to consider estimates that are valid for almost all intervals, rather than all intervals, so that one now studies local higher order uniformity estimates of the form

\displaystyle  \int_X^{2X} \sup_{F,g} | \sum_{x \leq n \leq x+H} (\Lambda(n) - \Lambda^\sharp(n)) \bar{F}(g(n)\Gamma)|\ dx = o(XH)

where {H = X^{\theta+\varepsilon}} and the supremum is over all nilsequences of a certain Lipschitz constant on a fixed nilmanifold {G/\Gamma}. This generalizes local Fourier uniformity estimates of the form

\displaystyle  \int_X^{2X} \sup_{\alpha} | \sum_{x \leq n \leq x+H} (\Lambda(n) - \Lambda^\sharp(n)) e(-\alpha n)|\ dx = o(XH).

There is particular interest in such estimates in the case of the Möbius function {\mu(n)} (where, as per the Möbius pseudorandomness conjecture, the approximant {\mu^\sharp} should be taken to be zero, at least in the absence of a Siegel zero). This is because if one could get estimates of this form for any {H} that grows sufficiently slowly in {X} (in particular {H = \log^{o(1)} X}), this would imply the (logarithmically averaged) Chowla conjecture, as I showed in a previous paper.

While one can lower {\theta} somewhat, there are still barriers. For instance, in the model case {F \equiv 1}, that is to say prime number theorems in almost all short intervals, until very recently the best value of {\theta} was {1/6}, recently lowered to {2/15} by Guth and Maynard (and can be lowered all the way to zero on the Density Hypothesis). Nevertheless, we are able to get some improvements at higher orders:

  • For the von Mangoldt function, we can get {\theta} as low as {1/3}, with an arbitrary logarithmic saving {\log^{-A} X} in the error terms; for divisor functions, one can even get power savings in this regime.
  • For the Möbius function, we can get {\theta=0}, recovering our previous result with Tamar Ziegler, but now with {\log^{-A} X} type savings in the exceptional set (though not in the pointwise bound outside of the set).
  • We can now also get comparable results for the divisor function.

As sample applications, we can obtain Hardy-Littlewood conjecture asymptotics for arithmetic progressions of almost all given steps {h \sim X^{1/3+\varepsilon}}, and divisor correlation estimates on arithmetic progressions for almost all {h \sim X^\varepsilon}.

Our proofs are rather long, but broadly follow the “contagion” strategy of Walsh, generalized from the Fourier setting to the higher order setting. Firstly, by standard Heath–Brown type decompositions, and previous results, it suffices to control “Type II” discorrelations such as

\displaystyle  \sup_{F,g} | \sum_{x \leq n \leq x+H} \alpha*\beta(n) \bar{F}(g(n)\Gamma)|

for almost all {x}, and some suitable functions {\alpha,\beta} supported on medium scales. So the bad case is when for most {x}, one has a discorrelation

\displaystyle  |\sum_{x \leq n \leq x+H} \alpha*\beta(n) \bar{F_x}(g_x(n)\Gamma)| \gg H

for some nilsequence {F_x(g_x(n) \Gamma)} that depends on {x}.

The main issue is the dependency of the polynomial {g_x} on {x}. By using a “nilsequence large sieve” introduced in our previous paper, and removing degenerate cases, we can show a functional relationship amongst the {g_x} that is very roughly of the form

\displaystyle  g_x(an) \approx g_{x'}(a'n)

whenever {n \sim x/a \sim x'/a'} (and I am being extremely vague as to what the relation “{\approx}” means here). By a higher order (and quantitatively stronger) version of Walsh’s contagion analysis (which is ultimately to do with separation properties of Farey sequences), we can show that this implies that these polynomials {g_x(n)} (which exert influence over intervals {[x,x+H]}) can “infect” longer intervals {[x', x'+Ha]} with some new polynomials {\tilde g_{x'}(n)} and various {x' \sim Xa}, which are related to many of the previous polynomials by a relationship that looks very roughly like

\displaystyle  g_x(n) \approx \tilde g_{ax}(an).

This can be viewed as a rather complicated generalization of the following vaguely “cohomological”-looking observation: if one has some real numbers {\alpha_i} and some primes {p_i} with {p_j \alpha_i \approx p_i \alpha_j} for all {i,j}, then one should have {\alpha_i \approx p_i \alpha} for some {\alpha}, where I am being vague here about what {\approx} means (and why it might be useful to have primes). By iterating this sort of contagion relationship, one can eventually get the {g_x(n)} to behave like an Archimedean character {n^{iT}} for some {T} that is not too large (polynomial size in {X}), and then one can use relatively standard (but technically a bit lengthy) “major arc” techiques based on various integral estimates for zeta and {L} functions to conclude.

The day after the election, I found myself struggling with how to approach the complex analysis class I was teaching. Could I ignore the (almost literal) elephant in the room? Would my students be in the right mental state to learn math? Would I be in the right mental state to teach it?

I opened with the statement that usually in math we have the luxury of working in abstractions far removed from the real world. We are familiar with addressing mathematical problems with the (inessential) connections to the real world stripped away, leaving only the essential features to focus one’s attention. An election, for instance, might be treated as the outcome of N people, each of which has a probability p of voting for one candidate, and 1-p for another… and one can then try to analyze the problem from a dispassionate mathematical perspective. This type of mindset can be illuminating in many contexts. Real world events have real consequences, however, and in light of an event as consequential as the last election, a math lecture on contour integration or the central limit theorem may seem meaningless.

But there is one precious thing mathematics has, that almost no other field currently enjoys: a consensus on what the ground truth is, and how to reach it. Because of this, even the strongest differences of opinion in mathematics can eventually be resolved, and mistakes realized and corrected. This consensus is so strong, we simply take it for granted: a solution is correct or incorrect, a theorem is proved or not proved, and when a problem is solved, we simply move on to the next one. This is, sadly, not a state of affairs elsewhere. But if my students can learn from this and carry these skills— such as distinguishing an overly simple but mathematically flawed “solution” from a more complex, but accurate actual solution—to other spheres that have more contact with the real world, then my math lectures have consequence. Even—or perhaps, especially—in times like these.

I have been collaborating for many years with a long-time personal friend and polymath, Tanya Klowden, on a popular book on astronomy tentatively entitled “Climbing the cosmic distance ladder“, which was initially based on a public lecture I have given on this topic for many years, but which we have found to be a far richer story than what I was aware of when I first gave those lectures. We had previously released a sample chapter on this blog back in 2020. The project had stalled for a few years for a number of reasons (including some pandemic-related issues); however, we are now trying to get it back on track. One of the steps we are taking to do so is to launch an Instagram account for the book, in which we aim to make some regular posts (tentatively on a weekly basis) on astronomical tidbits that we have encountered in the course of researching our book project. Tanya has started this series by sharing her experiences on comet-hunting in Hampstead Heath, following in the footsteps of the legendary comet-hunting 19th century astronomer Caroline Herschel (who, together with her brother William Herschel, will make a significant appearance in the “seventh rung” chapter of our book). I plan to follow up with an instagram post of my own in the next few days.

Almost three weeks ago, I proposed a collaborative project, combining the efforts of professional and amateur mathematicians, automatic theorem provers, AI tools, and the proof assistant language Lean, to describe the implication graph relating the 4694 equational laws for magmas that can be expressed using up to four invocations of the magma operation. That is to say, one needs to determine the truth or falsity of the {4694*(4694-1)=22028942} possible implications between the these 4694 laws.

The project was launched on the day of the blog post, and has been running for a hectic 19 days thus far; see my personal log of the project for a day-by-day summary of events. From the perspective of raw implications resolved, the project is (of the time of writing) 99.9963% complete: of the {22028942} implications to resolve, {8178279} have been proven to be true, {13854531} have been proven to be false, and only {826} remain open, although even within this set, there are {249} implications that we conjecture to be false and for which we are likely to be able to formally disprove soon. For reasons of compilation efficiency, we do not record the proof of every single one of these assertions in Lean; we only prove a smaller set of {592790} implications in Lean, which then imply the broader set of implications through transitivity (for instance, using the fact that if Equation {X} implies Equation {Y} and Equation {Y} implies Equation {Z}, then Equation {X} implies Equation {Z}); we will also shortly implement a further reduction utilizing a duality symmetry of the implication graph.

Thanks to the tireless efforts of many volunteer contributors to the project, we now have a number of nice visualization tools to inspect various portions of the (not quite completed) implication graph. For instance, this graph depicts all the consequences of Equation 1491: {x = (y \diamond x) \diamond (y \diamond (y \diamond x))}, which I have nicknamed the “Obelix law” (and it has a companion, the “Asterix law“, Equation 65: {x = (y \diamond (x \diamond (y \diamond x)))}). And here is a table of all the equational laws we are studying, together with a count of how many laws they imply, or are implied by. These interfaces are also somewhat integrated with Lean: for instance, you can click here to try your hand at showing that the Obelix law implies Equation 359, {x \diamond x = (x \diamond x) \diamond x}; I’ll leave this as a challenge (a four-line proof in Lean is possible).

Over the last few weeks, I’ve learned that many of these laws have previously appeared in the literature, and compiled a “tour” of these equations here. For instance, in addition to the very well known commutative law (Equation 43) and associative law (Equation 4512), some equations (Equation 14, Equation 29, Equation 381, Equation 3722, and Equation 3744) appeared in some Putnam math competitions; Equation 168 defines a fascinating structure, known as a “central groupoid”, that was studied in particular by Evans and by Knuth, and was a key inspiration for the Knuth-Bendix completion algorithm; and Equation 1571 classifies abelian groups of exponent two.

Thanks to the Birkhoff completeness theorem, if one equational law implies another, then it can be proven by a finite number of rewrite operations; however the number of rewrites needed could be quite lengthy. The implication of 359 from 1491 mentioned above is already moderately challenging, requiring four or five rewrites; the implication of Equation 2 from Equation 1689 is incredibly long (try it!). Nevertheless, standard automated theorem provers, such as Vampire, are quite capable of proving the vast majority of these implications.

More subtle are the anti-implications, in which we have to show that a law {X} does not imply a law {Y}. In principle, one just has to exhibit a magma that obeys {X} but not {Y}. In a large fraction of cases, one can simply search through small finite magmas – such as magmas on two, three, or four elements – to obtain this anti-implication; but they do not always suffice, and in fact we know of anti-implications that can only be proven through a construction of an infinite magma. For instance, the “Asterix law” is now known (from the efforts of this project) to not imply the “Obelix law”, but all counterexamples are necessarily infinite. Curiously, the known constructions have some affinity with the famous technique of forcing in set theory, in that we continually add “generic” elements to a (partial) magma in order to force a counterexample with certain specified properties to exist, though the constructions here are certainly far simpler than in the set-theoretic constructions.

We have also obtained profitable mileage out of constructions of “linear” magmas {x \diamond y = ax + by} in both commutative and non-commutative rings; free magmas associated to “confluent” equational laws, and more generally laws with complete rewriting systems. As such, the number of unresolved implications continues to shrink at a steady pace, although we are not yet ready to declare victory on the project.

After quite hectic amount of back end setup and “putting out fires”, the project is now running fairly smoothly, with activity coordinated on a Lean Zulip channel, and all contributions going through a pull request process on Github, and tracked via an issues-based Github project with the invaluable oversight provided by the other two maintainers of the project, Pietro Monticone and Shreyas Srinivas. In contrast to the prior PFR formalization project, the workflow follows standard Github practices and proceeds roughly as follows: if, during the course of the Zulip discussion, it becomes clear that some specific task needs to be done to move the project forward (e.g., to formalize in Lean the proof of an implication that had been worked out in the discussion threads), an “issue” is made (often by myself or one of the other maintainers), which other contributors can then “claim”, work on separately (using a local copy of the main Github repository), and then submit a “pull request” to merge their contribution back into the main repository. This request can then be reviewed by both maintainers and other contributors, and if approved, closes the relevant issue.

More generally, we are trying to document all of the processes and lessons learned from this setup, and this will be part of a forthcoming paper on this project, which we are now in the preliminary stages of planning, and will likely include dozens of authors.

While the project is still ongoing, I can say that I am quite satisfied with the progress accomplished to date, and that many of my hopes for such a project have already been realized. On the scientific side, we have discovered some new techniques and constructions to show that a given equational theory does not imply another one, and have also discovered some exotic algebraic structures, such as the “Asterix” and “Obelix” pair, that have interesting features, and which would likely not have been discovered by any means other than the type of systematic search conducted here. The participants are very diverse, ranging from mathematicians and computer scientists at all stages of career, to interested students and amateurs. The Lean platform has worked well in integrating both human-generated and machine-generated contributions; the latter are numerically by far the largest source of contributions, but many of the automatically generated results were first obtained in special cases by humans, and then generalized and formalized (often by different members of the project). We still make many informal mathematical arguments on the discussion thread, but they tend to be rapidly formalized in Lean, at which point disputes about correctness disappear, and we can instead focus on how best to deploy various verified techniques to tackle the remaining implications.

Perhaps the only thing that I was expecting to see at this point that has not yet materialized is significant contributions from modern AI tools. They are being used in a number of secondary ways in this project, for instance through tools such as Github copilot to speed up the writing of Lean proofs, the LaTeX blueprint, and other software code, and several of our visualization tools were also largely co-written using large language models such as Claude. However, for the core task of resolving implications, the more “good old-fashioned” automated theorem provers have so far proven superior. However, most of the remaining 700 or so implications are not amenable to these older tools, and several (particularly the ones involving “Asterix” and “Obelix” had stymied the human collaborators for several days), so I can still see a role for modern AI to play a more active role in finishing off the hardest and most stubborn of the remaining implications.

Traditionally, mathematics research projects are conducted by a small number (typically one to five) of expert mathematicians, each of which are familiar enough with all aspects of the project that they can verify each other’s contributions. It has been challenging to organize mathematical projects at larger scales, and particularly those that involve contributions from the general public, due to the need to verify all of the contributions; a single error in one component of a mathematical argument could invalidate the entire project. Furthermore, the sophistication of a typical math project is such that it would not be realistic to expect a member of the public, with say an undergraduate level of mathematics education, to contribute in a meaningful way to many such projects.

For related reasons, it is also challenging to incorporate assistance from modern AI tools into a research project, as these tools can “hallucinate” plausible-looking, but nonsensical arguments, which therefore need additional verification before they could be added into the project.

Proof assistant languages, such as Lean, provide a potential way to overcome these obstacles, and allow for large-scale collaborations involving professional mathematicians, the broader public, and/or AI tools to all contribute to a complex project, provided that it can be broken up in a modular fashion into smaller pieces that can be attacked without necessarily understanding all aspects of the project as a whole. Projects to formalize an existing mathematical result (such as the formalization of the recent proof of the PFR conjecture of Marton, discussed in this previous blog post) are currently the main examples of such large-scale collaborations that are enabled via proof assistants. At present, these formalizations are mostly crowdsourced by human contributors (which include both professional mathematicians and interested members of the general public), but there are also some nascent efforts to incorporate more automated tools (either “good old-fashioned” automated theorem provers, or more modern AI-based tools) to assist with the (still quite tedious) task of formalization.

However, I believe that this sort of paradigm can also be used to explore new mathematics, as opposed to formalizing existing mathematics. The online collaborative “Polymath” projects that several people including myself organized in the past are one example of this; but as they did not incorporate proof assistants into the workflow, the contributions had to be managed and verified by the human moderators of the project, which was quite a time-consuming responsibility, and one which limited the ability to scale these projects up further. But I am hoping that the addition of proof assistants will remove this bottleneck.

I am particularly interested in the possibility of using these modern tools to explore a class of many mathematical problems at once, as opposed to the current approach of focusing on only one or two problems at a time. This seems like an inherently modularizable and repetitive task, which could particularly benefit from both crowdsourcing and automated tools, if given the right platform to rigorously coordinate all the contributions; and it is a type of mathematics that previous methods usually could not scale up to (except perhaps over a period of many years, as individual papers slowly explore the class one data point at a time until a reasonable intuition about the class is attained). Among other things, having a large data set of problems to work on could be helpful for benchmarking various automated tools and compare the efficacy of different workflows.

One recent example of such a project was the Busy Beaver Challenge, which showed this July that the fifth Busy Beaver number {BB(5)} was equal to {47176870}. Some older crowdsourced computational projects, such as the Great Internet Mersenne Prime Search (GIMPS), are also somewhat similar in spirit to this type of project (though using more traditional proof of work certificates instead of proof assistants). I would be interested in hearing of any other extant examples of crowdsourced projects exploring a mathematical space, and whether there are lessons from those examples that could be relevant for the project I propose here.

More specifically I would like to propose the following (admittedly artificial) project as a pilot to further test out this paradigm, which was inspired by a MathOverflow question from last year, and discussed somewhat further on my Mastodon account shortly afterwards.

The problem is in the field of universal algebra, and concerns the (medium-scale) exploration of simple equational theories for magmas. A magma is nothing more than a set {G} equipped with a binary operation {\circ: G \times G \rightarrow G}. Initially, no additional axioms on this operation {\circ} are imposed, and as such magmas by themselves are somewhat boring objects. Of course, with additional axioms, such as the identity axiom or the associative axiom, one can get more familiar mathematical objects such as groups, semigroups, or monoids. Here we will be interested in (constant-free) equational axioms, which are axioms of equality involving expressions built from the operation {\circ} and one or more indeterminate variables in {G}. Two familiar examples of such axioms are the commutative axiom

\displaystyle  x \circ y = y \circ x

and the associative axiom

\displaystyle  (x \circ y) \circ z = x \circ (y \circ z),

where {x,y,z} are indeterminate variables in the magma {G}. On the other hand the (left) identity axiom {e \circ x = x} would not be considered an equational axiom here, as it involves a constant {e \in G} (the identity element), which we will not consider here.

To illustrate the project I have in mind, let me first introduce eleven examples of equational axioms for magmas:

  • Equation1: {x=y}
  • Equation2: {x \circ y = z \circ w}
  • Equation3: {x \circ y = x}
  • Equation4: {(x \circ x) \circ y = y \circ x}
  • Equation5: {x \circ (y \circ z) = (w \circ u) \circ v}
  • Equation6: {x \circ y = x \circ z}
  • Equation7: {x \circ y = y \circ x}
  • Equation8: {x \circ (y \circ z) = (x \circ w) \circ u}
  • Equation9: {x \circ (y \circ z) = (x \circ y) \circ w}
  • Equation10: {x \circ (y \circ z) = (x \circ y) \circ z}
  • Equation11: {x = x}
Thus, for instance, Equation7 is the commutative axiom, and Equation10 is the associative axiom. The constant axiom Equation1 is the strongest, as it forces the magma {G} to have at most one element; at the opposite extreme, the reflexive axiom Equation11 is the weakest, being satisfied by every single magma.

One can then ask which axioms imply which others. For instance, Equation1 implies all the other axioms in this list, which in turn imply Equation11. Equation8 implies Equation9 as a special case, which in turn implies Equation10 as a special case. The full poset of implications can be depicted by the following Hasse diagram:

This in particular answers the MathOverflow question of whether there were equational axioms intermediate between the constant axiom Equation1 and the associative axiom Equation10.

Most of the implications here are quite easy to prove, but there is one non-trivial one, obtained in this answer to a MathOverflow post closely related to the preceding one:

Proposition 1 Equation4 implies Equation7.

Proof: Suppose that {G} obeys Equation4, thus

\displaystyle  (x \circ x) \circ y = y \circ x \ \ \ \ \ (1)

for all {x,y \in G}. Specializing to {y=x \circ x}, we conclude

\displaystyle (x \circ x) \circ (x \circ x) = (x \circ x) \circ x

and hence by another application of (1) we see that {x \circ x} is idempotent:

\displaystyle  (x \circ x) \circ (x \circ x) = x \circ x. \ \ \ \ \ (2)

Now, replacing {x} by {x \circ x} in (1) and then using (2), we see that

\displaystyle  (x \circ x) \circ y = y \circ (x \circ x),

so in particular {x \circ x} commutes with {y \circ y}:

\displaystyle  (x \circ x) \circ (y \circ y) = (y \circ y) \circ (x \circ x). \ \ \ \ \ (3)

Also, from two applications (1) one has

\displaystyle  (x \circ x) \circ (y \circ y) = (y \circ y) \circ x = x \circ y.

Thus (3) simplifies to {x \circ y = y \circ x}, which is Equation7. \Box

A formalization of the above argument in Lean can be found here.

I will remark that the general question of determining whether one set of equational axioms determines another is undecidable; see Theorem 14 of this paper of Perkins. (This is similar in spirit to the more well known undecidability of various word problems.) So, the situation here is somewhat similar to the Busy Beaver Challenge, in that past a certain point of complexity, we would necessarily encounter unsolvable problems; but hopefully there would be interesting problems and phenomena to discover before we reach that threshold.

The above Hasse diagram does not just assert implications between the listed equational axioms; it also asserts non-implications between the axioms. For instance, as seen in the diagram, the commutative axiom Equation7 does not imply the Equation4 axiom

\displaystyle  (x+x)+y = y + x.

To see this, one simply has to produce an example of a magma that obeys the commutative axiom Equation7, but not the Equation4 axiom; but in this case one can simply choose (for instance) the natural numbers {{\bf N}} with the addition operation {x \circ y := x+y}. More generally, the diagram asserts the following non-implications, which (together with the indicated implications) completely describes the poset of implications between the eleven axioms:
  • Equation2 does not imply Equation3.
  • Equation3 does not imply Equation5.
  • Equation3 does not imply Equation7.
  • Equation5 does not imply Equation6.
  • Equation5 does not imply Equation7.
  • Equation6 does not imply Equation7.
  • Equation6 does not imply Equation10.
  • Equation7 does not imply Equation6.
  • Equation7 does not imply Equation10.
  • Equation9 does not imply Equation8.
  • Equation10 does not imply Equation9.
  • Equation10 does not imply Equation6.
The reader is invited to come up with counterexamples that demonstrate some of these implications. The hardest type of counterexamples to find are the ones that show that Equation9 does not imply Equation8: a solution (in Lean) can be found here. I placed proofs in Lean of all the above implications and anti-implications can be found in this github repository file.

As one can see, it is already somewhat tedious to compute the Hasse diagram of just eleven equations. The project I propose is to try to expand this Hasse diagram by a couple orders of magnitude, covering a significantly larger set of equations. The set I propose is the set {{\mathcal E}} of equations that use the magma operation {\circ} at most four times, up to relabeling and the reflexive and symmetric axioms of equality; this includes the eleven equations above, but also many more. How many more? Recall that the Catalan number {C_n} is the number of ways one can form an expression out of {n} applications of a binary operation {\circ} (applied to {n+1} placeholder variables); and, given a string of {m} placeholder variables, the Bell number {B_m} is the number of ways (up to relabeling) to assign names to each of these variables, where some of the placeholders are allowed to be assigned the same name. As a consequence, ignoring symmetry, the number of equations that involve at most four operations is

\displaystyle  \sum_{n,m \geq 0: n+m \leq 4} C_n C_m B_{n+m+2} = 9131.

The number of equations in which the left-hand side and right-hand side are identical is

\displaystyle  \sum_{n=0}^2 C_n B_{n+1} = 1 * 1 + 1 * 2 + 2 * 5 = 13;

these are all equivalent to reflexive axiom (Equation11). The remaining {9118} equations come in pairs by the symmetry of equality, so the total size of {{\mathcal E}} is

\displaystyle  1 + \frac{9118}{2} = 4560.

I have not yet generated the full list of such identities, but presumably this will be straightforward to do in a standard computer language such as Python (I have not tried this, but I imagine some back-and-forth with a modern AI would let one generate most of the required code). [UPDATE, Sep 26: Amir Livne Bar-on has kindly enumerated all the equations, of which there are actually 4694.]

It is not clear to me at all what the geometry of {{\mathcal E}} will look like. Will most equations be incomparable with each other? Will it stratify into layers of “strong” and “weak” axioms? Will there be a lot of equivalent axioms? It might be interesting to record now any speculations as what the structure of this poset, and compare these predictions with the outcome of the project afterwards.

A brute force computation of the poset {{\mathcal E}} would then require {4560 \times (4560-1) = 20789040} comparisons, which looks rather daunting; but of course due to the axioms of a partial order, one could presumably identify the poset by a much smaller number of comparisons. I am thinking that it should be possible to crowdsource the exploration of this poset in the form of submissions to a central repository (such as the github repository I just created) of proofs in Lean of implications or non-implications between various equations, which could be validated in Lean, and also checked against some file recording the current status (true, false, or open) of all the {20789040} comparisons, to avoid redundant effort. Most submissions could be handled automatically, with relatively little human moderation required; and the status of the poset could be updated after each such submission.

I would imagine that there is some “low-hanging fruit” that could establish a large number of implications (or anti-implications) quite easily. For instance, laws such as Equation2 or Equation3 more or less completely describe the binary operation {\circ}, and it should be quite easy to check which of the {4560} laws are implied by either of these two laws. The poset {{\mathcal E}} has a reflection symmetry associated to replacing the binary operator {\circ} by its reflection {\circ^{\mathrm{op}}: (x,y) \mapsto y \circ x}, which in principle cuts down the total work by a factor of about two. Specific examples of magmas, such as the natural numbers with the addition operation, obey some set of equations in {{\mathcal E}} but not others, and so could be used to generate a large number of anti-implications. Some existing automated proving tools for equational logic, such as Prover9 and Mace4 (for obtaining implications and anti-implications respectively), could then be used to handle most of the remaining “easy” cases (though some work may be needed to convert the outputs of such tools into Lean). The remaining “hard” cases could then be targeted by some combination of human contributors and more advanced AI tools.

Perhaps, in analogy with formalization projects, we could have a semi-formal “blueprint” evolving in parallel with the formal Lean component of the project. This way, the project could accept human-written proofs by contributors who do not necessarily have any proficiency in Lean, as well as contributions from automated tools (such as the aforementioned Prover9 and Mace4), whose output is in some other format than Lean. The task of converting these semi-formal proofs into Lean could then be done by other humans or automated tools; in particular I imagine modern AI tools could be particularly valuable for this portion of the workflow. I am not quite sure though if existing blueprint software can scale to handle the large number of individual proofs that would be generated by this project; and as this portion would not be formally verified, a significant amount of human moderation might also be needed here, and this also might not scale properly. Perhaps the semi-formal portion of the project could instead be coordinated on a forum such as this blog, in a similar spirit to past Polymath projects.

It would be nice to be able to integrate such a project with some sort of graph visualization software that can take an incomplete determination of the poset {{\mathcal E}} as input (in which each potential comparison {E \implies E'} in {{\mathcal E}} is marked as either true, false, or open), completes the graph as much as possible using the axioms of partial order, and then presents the partially known poset in a visually appealing way. If anyone knows of such a software package, I would be happy to hear of it in the comments.

Anyway, I would be happy to receive any feedback on this project; in addition to the previous requests, I would be interested in any suggestions for improving the project, as well as gauging whether there is sufficient interest in participating to actually launch it. (I am imagining running it vaguely along the lines of a Polymath project, though perhaps not formally labeled as such.)

UPDATE, Sep 30 2024: The project is up and running (and highly active), with the main page being this Github repository. See also the Lean Zulip chat for some (also very active) discussion on the project.

Ben Krause, Hamed Mousavi, Joni Teräväinen, and I have just uploaded to the arXiv the paper “Pointwise convergence of bilinear polynomial averages over the primes“. This paper builds upon a previous result of Krause, Mirek, and myself, in which we demonstrated the pointwise almost everywhere convergence of the ergodic averages

\displaystyle  \frac{1}{N} \sum_{n=1}^N f(T^n x) g(T^{P(n)} x) \ \ \ \ \ (1)

as {N \rightarrow \infty} and almost all {x \in X}, whenever {(X,T,\mu)} is a measure-preserving system (not necessarily of finite measure), and {f \in L^{p_1}(X,\mu)}, {g \in L^{p_2}(X,\mu)} for some {1 < p_1,p_2 < \infty} with {1/p_1 + 1/p_2 \leq 1}, where {P} is a polynomial with integer coefficients and degree at least two. Here we establish the prime version of this theorem, that is to say we establish the pointwise almost everywhere convergence of the averages

\displaystyle  \frac{1}{\pi(N)} \sum_{p \leq N} f(T^p x) g(T^{P(p)} x)

under the same hypotheses on {(X,T,\mu)}, {f, g}. By standard arguments this is equivalent to the pointwise almost everywhere convergence of the weighted averages

\displaystyle  \frac{1}{N} \sum_{n \leq N} \Lambda(n) f(T^n x) g(T^{P(n)} x) \ \ \ \ \ (2)

where {\Lambda} is the von Mangoldt function. Our argument also borrows from results in a recent paper of Teräväinen, who showed (among other things) that the similar averages

\displaystyle  \frac{1}{N} \sum_{n \leq N} \mu(n) f(T^n x) g(T^{P(n)} x)

converge almost everywhere (quite fast) to zero, at least if {X} is assumed to be finite measure. Here of course {\mu} denotes the Möbius function.

The basic strategy is to try to insert the weight {\Lambda} everywhere in the proof of the convergence of (1) and adapt as needed. The weighted averages are bilinear averages associated to the bilinear symbol

\displaystyle  (\xi_1,\xi_2) \mapsto \frac{1}{N} \sum_{n \leq N} \Lambda(n) e(n \xi_1 + P(n) \xi_2).

In the unweighted case, results from the additive combinatorics theory of Peluse and Prendiville were used to essentially reduce matters to the contribution where {\xi_1,\xi_2} were “major arc”, at which point this symbol could be approximated by a more tractable symbol. Setting aside the Peluse-Prendiville step for now, the first obstacle is that the natural approximation to the symbol does not have sufficiently accurate error bounds if a Siegel zero exists. While one could in principle fix this by adding a Siegel correction term to the approximation, we found it simpler to use the arguments of Teräväinen to essentially replace the von Mangoldt weight {\Lambda} by a “Cramér approximant”

\displaystyle  \Lambda_{\mathrm{Cramer}, w}(n) := \frac{W}{\phi(W)} 1_{(n,W)=1}

where {W = \prod_{p \leq w} p} and {w} is a parameter (we make the quasipolynomial choice {w = \exp(\log^{1/C_0} N)} for a suitable absolute constant {N}). This approximant is then used for most of the argument, with relatively routine changes; for instance, an {L^p} improving estimate needs to be replaced by a weighted analogue that is relatively easy to establish from the unweighted version due to an {L^2} smoothing effect, and a sharp {p}-adic bilinear averaging estimate for large {p} can also be adapted to handle a suitable {p}-adic weight by a minor variant of the arguments. The most tricky step is to obtain a weighted version of the Peluse-Prendiville inverse theorem. Here we encounter the technical problem that the Cramér approximant, despite having many good properties (in particular, it is non-negative and has well-controlled correlations thanks to the fundamental lemma of sieve theory), is not of “Type I”, which turns out to be quite handy when establishing inverse theorems. So for this portion of the argument, we switch from the Cramér approximant to the Heath-Brown approximant

\displaystyle  \Lambda_{\mathrm{HB},Q}(n) := \sum_{q<Q} \frac{\mu(q)}{\phi(q)} c_q(n)

where {c_q(n)} is the Ramanujan sum

\displaystyle  c_q(n) := \sum_{r \in ({\bf Z}/q{\bf Z})^\times} e(-rn/q).

While this approximant is no longer non-negative, it is of Type I, and thus well suited for inverse theory. In our paper we set up some basic comparison theorems between {\Lambda}, {\Lambda_{\mathrm{Cramer},w}}, and {\Lambda_{\mathrm{HB},Q}} in various Gowers uniformity-type norms, which allows us to switch relatively easily between the different weights in practice; hopefully these comparison theorems will be useful in other applications as well.

Archives