# Matematički blogovi

### 254A, Notes 3: Local well-posedness for the Euler equations

We now turn to the local existence theory for the initial value problem for the incompressible Euler equations

Formally, the Euler equations (with normalised pressure) arise as the vanishing viscosity limit of the Navier-Stokes equations

that was studied in previous notes. However, because most of the bounds established in previous notes, either on the lifespan of the solution or on the size of the solution itself, depended on , it is not immediate how to justify passing to the limit and obtain either a strong well-posedness theory or a weak solution theory for the limiting equation (1). (For instance, weak solutions to the Navier-Stokes equations (or the approximate solutions used to create such weak solutions) have lying in for , but the bound on the norm is and so one could lose this regularity in the limit , at which point it is not clear how to ensure that the nonlinear term still converges in the sense of distributions to what one expects.)

Nevertheless, by carefully using the energy method (which we will do loosely following an approach of Bertozzi and Majda), it is still possible to obtain *local-in-time* estimates on (high-regularity) solutions to (3) that are uniform in the limit . Such *a priori* estimates can then be combined with a number of variants of these estimates obtain a satisfactory local well-posedness theory for the Euler equations. Among other things, we will be able to establish the *Beale-Kato-Majda criterion* – smooth solutions to the Euler (or Navier-Stokes) equations can be continued indefinitely unless the integral

becomes infinite at the final time , where is the *vorticity* field. The vorticity has the important property that it is transported by the Euler flow, and in two spatial dimensions it can be used to establish global regularity for both the Euler and Navier-Stokes equations in these settings. (Unfortunately, in three and higher dimensions the phenomenon of vortex stretching has frustrated all attempts to date to use the vorticity transport property to establish global regularity of either equation in this setting.)

There is a rather different approach to establishing local well-posedness for the Euler equations, which relies on the *vorticity-stream* formulation of these equations. This will be discused in a later set of notes.

** — 1. A priori bounds — **

We now develop some *a priori* bounds for very smooth solutions to Navier-Stokes that are uniform in the viscosity . Define an function to be a function that lies in every space; similarly define an function to be a function that lies in for every . Given divergence-free initial data , an mild solution to the Navier-Stokes initial value problem (3) is a solution that is an mild solution for all . From the (non-periodic version of) Corollary 40 of Notes 1, we know that for any divergence-free initial data , there is unique maximal Cauchy development , with infinite if is finite.

Here are our first bounds:

**Theorem 1 (A priori bound)** Let be an maximal Cauchy development to (3) with initial data .

- (i) For any integer , we have
Furthermore, if for a sufficiently small constant depending only on , then

- (ii) For any and integer , one has

The hypothesis that is integer can be dropped by more heavily exploiting the theory of paraproducts, but we shall restrict attention to integer for simplicity.

We now prove this theorem using the energy method. Using the Navier-Stokes equations, we see that and all lie in for any ; an easy iteration argument then shows that the same is true for all higher derivatives of also. This will make it easy to justify the differentiation under the integral sign that we shall shortly perform.

Let be an integer. For each time , we introduce the energy-type quantity

Here we think of as taking values in the Euclidean space . This quantity is of course comparable to , up to constants depending on . It is easy to verify that is continuously differentiable in time, with derivative

where we suppress explicit dependence on in the integrand for brevity. We now try to bound this quantity in terms of . We expand the right-hand side in coordinates using (3) to obtain

where

For , we can integrate by parts to move the operator onto and use the divergence-free nature of to conclude that . Similarly, we may integrate by parts for to move one copy of over to the other factor in the integrand to conclude

so in particular (note that as we are seeking bounds that are uniform in , we can’t get much further use out of beyond this bound). Thus we have

Now we expand out using the Leibniz rule. There is one dangerous term, in which all the derivatives in fall on the factor, giving rise to the expression

But we can locate a total derivative to write this as

and then an integration by parts using as before shows that this term vanishes. Estimating the remaining contributions to using the triangle inequality, we arrive at the bound

At this point we now need a variant of Proposition 35 from Notes 1:

**Exercise 2** Let be integers. For any , show that

(*Hint:* for or , use Hölder’s inequality. Otherwise, use a suitable Littlewood-Paley decomposition.)

Using this exercise and Hölder’s inequality, we see that

By Gronwall’s inequality we conclude that

for any and , which gives part (ii).

Now assume . Then we have the Sobolev embedding

which when inserted into (4) yields the differential inequality

or equivalently

for some constant (strictly speaking one should work with for some small which one sends to zero later, if one wants to avoid the possibility that vanishes, but we will ignore this small technicality for sake of exposition.) Since , we conclude that stays bounded for a time interval of the form ; this, together with the blowup criterion that must go to infinity as , gives part (i).

As a consequence, we can now obtain local existence for the Euler equations from smooth data:

**Corollary 3 (Local existence for smooth solutions)** Let be divergence-free. Let be an integer, and set

Then there is a smooth solution , to (1) with all derivatives of in for appropriate . Furthermore, for any integer , one has

*Proof:* We use the compactness method, which will be more powerful here than in the last section because we have much higher regularity uniform bounds (but they are only local in time rather than global). Let be a sequence of viscosities going to zero. By the local existence theory for Navier-Stokes (Corollary 40 of Notes 1), for each we have a maximal Cauchy development , to the Navier-Stokes initial value problem (3) with viscosity and initial data . From Theorem 1(i), we have for all (if is small enough), and

for all . By Sobolev embedding, this implies that

and then by Theorem 1(ii) one has

for every integer . Thus, for each , is bounded in , uniformly in . By repeatedly using (3) and product estimates for Sobolev spaces, we see the same is true for , and for all higher derivatives of . In particular, all derivatives of are equicontinuous.

Using weak compactness (Proposition 2 of Notes 2), one can pass to a subsequence such that converge weakly to some limits , such that and all their derivatives lie in on ; in particular, are smooth. From the Arzelá-Ascoli theorem (and Proposition 3 of Notes 2), and converge locally uniformly to , and similarly for all derivatives of . One can then take limits in (3) and conclude that solve (1). The bound (5) follows from taking limits in (6).

**Remark 4** We are able to easily pass to the zero viscosity limit here because our domain has no boundary. In the presence of a boundary, we cannot freely differentiate in space as casually as we have been doing above, and one no longer has bounds on higher derivatives on and near the boundary that are uniform in the viscosity. Instead, it is possible for the fluid to form a thin boundary layer that has a non-trivial effect on the limiting dynamics. We hope to return to this topic in a future set of notes.

We have constructed a local smooth solution to the Euler equations from smooth data, but have not yet established uniqueness or continuous dependence on the data; related to the latter point, we have not extended the construction to larger classes of initial data than the smooth class . To accomplish these tasks we need a further *a priori* estimate, now involving *differences* of two solutions, rather than just bounding a single solution:

**Theorem 5 (A priori bound for differences)** Let , let be an integer, and let be divergence-free with norm at most . Let

where is sufficiently small depending on . Let and be an solution to (1) with initial data (this exists thanks to Corollary 3), and let and be an solution to (1) with initial data . Then one has

Note the asymmetry between and in (8): this estimate requires control on the initial data in the high regularity space in order to be usable, but has no such requirement on the initial data . This asymmetry will be important in some later applications.

*Proof:* From Corollary 3 we have

Now we need bounds on the difference . Initially we have , where . To evolve later in time, we will need to use the energy method. Subtracting (1) for and , we have

By hypothesis, all derivatives of and lie in on , which will allow us to justify the manipulations below without difficulty. We introduce the low regularity energy for the difference:

Arguing as in the proof of Proposition 1, we see that

where

As before, the divergence-free nature of ensures that vanishes. For , we use the Leibniz rule and again extract out the dangerous term

which again vanishes by integration by parts. We then use the triangle inequality to bound

Using Exercise 2 and Hölder, we may bound this by

which by Sobolev embedding gives

Applying (9) and Gronwall’s inequality, we conclude that

for , and (7) follows.

Now we work with the high regularity energ

Arguing as before we have

Using Exercise 2 and Hölder, we may bound this by

Using Sobolev embedding we thus have

By the chain rule, we obtain

(one can work with in place of and then send later if one wishes to avoid a lack of differentiability at ). By Gronwall’s inequality, we conclude that

for all , and (8) follows.

By specialising (7) (or (8)) to the case where , we see the solution constructed in Corollary 3 is unique. Now we can extend to wider classes of initial data than initial data. The following result is essentially due to Kato and to Swann (with a similar result obtained by different methods by Ebin-Marsden):

**Proposition 6** Let be an integer, and let be divergence-free. Set

where is sufficiently small depending on . Let be a sequence of divergence-free vector fields converging to in norm (for instance, one could apply Littlewood-Paley projections to ). Let , be the associated solutions to (1) provided by Corollary 3 (these are well-defined for large enough). Then and converge in norm on to limits , respectively, which solve (1) in a distributional sense.

*Proof:* We use a variant of Kato’s argument (see also the paper of Bona and Smith for a related technique). It will suffice to show that the form a Cauchy sequence in , since the algebra properties of then give the same for , and one can then easily take limits (in this relatively high regularity setting) to obtain the limiting solution that solves (1) in a distributional sense.

Let be a large dyadic integer. By Corollary 3, we may find an solution be the solution to the Euler equations (1) with initial data (which lies in ). From Theorem 5, one has

Applying the triangle inequality and then taking limit superior, we conclude that

But by Plancherel’s theorem and dominated convergence we see that

as , and hence

giving the claim.

**Remark 7** Since the sequence can converge to at most one limit , we see that the solution to (1) is unique in the class of distributional solutions that are limits of smooth solutions (with initial data of those solutions converging to in ). However, this leaves open the possibility that there are other distributional solutions that do not arise as the limits of smooth solutions (or as limits of smooth solutions whose initial data only converge to in a weaker sense). It is possible to recover some uniqueness results for fairly weak solutions to the Euler equations if one also assumes some additional regularity on the fields (or on related fields such as the vorticity ). In two dimensions, for instance, there is a celebrated theorem of Yudovich that weak solutions to 2D Euler are unique if one has an bound on the vorticity (together with some other technical conditions). In higher dimensions one can also obtain uniqueness results if one assumes that the solution is in a high-regularity space such as , . See for instance this paper of Chae for an example of such a result.

**Exercise 8 (Continuous dependence on initial data)** Let be an integer, let , and set , where is sufficiently small depending on . Let be the closed ball of radius around the origin of divergence-free vector fields in . The above proposition provides a solution to the associated initial value problem. Show that the map from to is a continuous map from to .

**Remark 9** The continuity result provided by the above exercise is not as strong as in Navier-Stokes, where the solution map is in fact Lipschitz continuous (see e.g., Exercise 43 of Notes 1). In fact for the Euler equations, which is classified as a “quasilinear” equation rather than a “semilinear” one due to the lack of the dissipative term in the equation, the solution map is not expected to be uniformly continuous on this ball, let alone Lipschitz continuous. See this previous blog post for some more discussion.

**Exercise 10 (Maximal Cauchy development)** Let be an integer, and let be divergence free. Show that there exists a unique and unique , with the following properties:

- (i) If and is divergence-free and converges to in norm, then for large enough, there is an solution to (1) with initial data on , and furthermore and converge in norm on to .
- (ii) If , then as .
- (iii) If , then we have the
*weak Beale-Kato-Majda criterion*

Furthermore, show that do not depend on the particular choice of , in the sense that if belongs to both and for two integers then the time and the fields produced by the above claims are the same for both and .

We will refine part (iii) of the above exercise in the next section. It is a major open problem as to whether the case (i.e., finite time blowup) can actually occur. (It is important here that we have some spatial decay at infinity, as represented here by the presence of the norm; when the solution is allowed to diverge at spatial infinity, it is not difficult to construct smooth solutions to the Euler equations that blow up in finite time; see e.g., this article of Stuart for an example.)

**Remark 11** The condition that recurs in the above results can be explained using the heuristics from Section 5 of Notes 1. Assume that a given time , the velocity field fluctuates at a spatial frequency , with the fluctuations being of amplitude . (We however permit the velocity field to contain a “bulk” low frequency component which can have much higher amplitude than ; for instance, the first component of might take the form where is a quantity much larger than .) Suppose one considers the trajectories of two particles whose separation at time zero is comparable to the wavelength of the frequency oscillation. Then the relative velocities of will differ by about , so one would expect the particles to stay roughly the same distance from each other up to time , and then exhibit more complicated and unpredictable behaviour after that point. Thus the natural time scale here is , so one only expects to have a reasonable local well-posedness theory in the regime

On the other hand, if lies in , and the frequency fluctuations are spread out over a set of volume , the heuristics from the previous notes predict that

The uncertainty principle predicts , and so

Thus we force the regime (11) to occur if , and barely have a chance of doing so in the endpoint case , but would not expect to have a local theory (at least using the sort of techniques deployed in this section) for .

**Exercise 12** Use similar heuristics to explain the relevance of quantities of the form that occurs in various places in this section.

Because the solutions constructed in Exercise 10 are limits (in rather strong topologies) of smooth solutions, it is fairly easy to extend estimates and conservation laws that are known for smooth solutions to these slightly less regular solutions. For instance:

**Exercise 13** Let be as in Exercise 10.

- (i) (Energy conservation) Show that for all .
- (ii) Show that
for all .

**Exercise 14 (Vanishing viscosity limit)** Let the notation and hypotheses be as in Corollary 3. For each , let , be the solution to (3) with this choice of viscosity and with initial data . Show that as , and converge locally uniformly to , and similarly for all derivatives of and . (In other words, there is actually no need to pass to a subsequence as is done in the proof of Corollary 3.) *Hint:* apply the energy method to control the difference .

**Exercise 15 (Local existence for forced Euler)** Let be divergence-free, and let , thus is smooth and for any and any integer and , . Show that there exists and a smooth solution to the forced Euler equation

*Note:* one will first need a local existence theory for the forced Navier-Stokes equation. It is also possible to develop forced analogues of most of the other results in this section, but we will not detail this here.

** — 2. The Beale-Kato-Majda blowup criterion — **

In Exercise 10 we saw that we could continue solutions, to the Euler equations indefinitely in time, unless the integral became infinite at some finite time . There is an important refinement of this blowup criterion, due to Beale, Kato, and Majda, in which the tensor is replaced by the vorticity two-form (or vorticity, for short)

that is to say is essentially the anti-symmetric component of . Whereas is the tensor field

is the anti-symmetric tensor field

**Remark 16** In two dimensions, is essentially a scalar, since and . As such, it is common in fluid mechanics to refer to the scalar field as the vorticity, rather than the two form . In three dimensions, there are three independent components of the vorticity, and it is common to view as a vector field rather than a two-form in this case (actually, to be precise would be a pseudovector field rather than a vector field, because it behaves slightly differently to vectors with respect to changes of coordinate). With this interpretation, the vorticity is now the curl of the velocity field . From a differential geometry viewpoint, one can view the two-form as an antisymmetric bilinear map from vector fields to scalar functions , and the relation between the vorticity two-form and the vorticity (pseudo-)vector field in is given by the relation

for arbitrary vector fields , where is the volume form on , which can be viewed in three dimensions as an antisymmetric trilinear form on vector fields. The fact that is a pseudovector rather than a vector then arises from the fact that the volume form changes sign upon applying a reflection.

The point is that vorticity behaves better under the Euler flow than the full derivative . Indeed, if one takes a smooth solution to the Euler equation in coordinates

and applies to both sides, one obtains

If one interchanges and then subtracts, the pressure terms disappear, and one is left with

which we can rearrange using the material derivative as

Writing and , this becomes the *vorticity equation*

The vorticity equation is particularly simple in two and three dimensions:

**Exercise 17 (Transport of vorticity)** Let be a smooth solution to Euler equation in , and let be the vorticity two-form.

- (i) If , show that
- (ii) If , show that
where is the vorticity pseudovector.

**Remark 18** One can interpret the vorticity equation in the language of differential geometry, which is a more covenient formalism when working on more general Riemann manifolds than . To be consistent with the conventions of differential geometry, we now write the components of the velocity field as rather than (and the coordinates of as rather than ). Define the *covelocity -form* as

where is the Euclidean metric tensor (in the standard coordinates, is the Kronecker delta, though can take other values than if one uses a different coordinate system). Thus in coordinates, ; the covelocity field is thus the musical isomorphism applied to the velocity field. The vorticity -form can then be interpreted as the exterior derivative of the covelocity, thus

or in coordinates

The Euler equations can be rearranged as

where is the Lie derivative along , which for -forms is given in coordinates as

and is the modified pressure

If one takes exterior derivatives of both sides of (14) using the basic differential geometry identities and , one obtains the vorticity equation

where the Lie derivative for -forms is given in coordinates as

and so we recover (13) after some relabeling.

We now present the Beale-Kato-Majda condition.

**Theorem 19 (Beale-Kato-Majda)** Let be an integer, and let be divergence free. Let , be the maximal Cauchy development from Exercise 10, and let be the vorticity.

The double exponential in (i) is not a typo! It is an open question though as to whether this double exponential bound can be at all improved, even in the simplest case of two spatial dimensions.

We turn to the proof of this theorem. Part (ii) will be implied by part (i), since if is finite then part (i) gives a uniform bound on as , preventing finite time blowup. So it suffices to prove part (i). To do this, it suffices to do so for solutions, since one can then pass to a limit (using the strong continuity in ) to establish the general case. In particular, we can now assume that are smooth.

We would like to convert control on back to control of the full derivative . If one takes divergences of the vorticity using (12) and the divergence-free nature of , we see that

Thus, we can recover the derivative from the vorticity by the formula

where one can define via the Fourier transform as a multiplier bounded on every space.

If the operators were bounded in , then we would have

and the claimed bound (15) would follow from Theorem 1(ii) (with one exponential to spare). Unfortunately, is not quite bounded on . Indeed, from Exercise 18 of Notes 1 we have the formula

for any test function and , where is the singular kernel

If one sets to be a (smooth approximation) to the signum restricted to an annulus , we conclude that the operator norm of is at least as large as

But one can calculate using polar coordinaates that this expression diverges like in the limit , , giving unboundedness.

As it turns out, though, the Gronwall argument used to establish Theorem 1(ii) can just barely tolerate an additional “logarithmic loss” of the above form, albeit at the cost of worsening the exponential term to a double exponential one. The key lemma is the following result that quantifies the logarithmic divergence indicated by the previous calculaation, and is similar in spirit to a well known inequality of Brezis and Wainger.

**Lemma 20 (Near-boundedness of )** For any and , one has

The lower order terms will be easily dealt with in practice; the main point is that one can almost bound the norm of by that of , up to a logarithmic factor.

*Proof:* By a limiting argument we may assume that is a test function. We apply Littlewood-Paley decomposition to write

and hence by the triangle inequality we may bound the left-hand side of (17) by

where we omit the domain and range from the function space norms for brevity.

By Bernstein’s inequality we have

Also, from Bernstein and Plancherel we have

and hence by geometric series we have

for any . This gives an acceptable contribution if we select . This leaves remaining values of to control, so if one can bound

Observe from applying the scaling (that is, replacing with that to prove (18) for all it suffices to do so for . By Fourier analysis, the function is the convolution of with the inverse Fourier transform of the function

This function is a test function, so is a Schwartz function, and the claim now follows from Young’s inequality.

We return now to the proof of (15). We adapt the proof of Proposition 1(i). As in that proposition, we introduce the higher energy

We no longer have the viscosity term as , but that term was discarded anyway in the analysis. From (4) we have

Applying (16), (20) one thus has

From Exercise 13 one has

By the chain rule, one then has

and hence by Gronwall’s inequality one has

The claim (15) follows.

**Remark 21** The Beale-Kato-Majda criterion can be sharpened a little bit, by replacing the sup norm with slightly smaller norms, such as the bounded mean oscillation (BMO) norm of , basically by improving the right-hand side of Lemma 20 slightly. See for instance this paper of Planchon and the references therein.

**Remark 22** An inspection of the proof of Theorem 19 reveals that the same result holds if the Euler equations are replaced by the Navier-Stokes equations; the energy estimates acquire an additional “” term by doing so (as in the proof of Proposition 1), but the sign of that term is favorable.

We now apply the Beale-Kato-Majda criterion to obtain global well-posedness for the Euler equations in two dimensions:

**Theorem 23 (Global well-posedness)** Let be as in Exercise 10. If , then .

This theorem will be immediate from Theorem 19 and the following conservation law:

**Proposition 24 (Conservation of vorticity distribution)** Let be as in Exercise 10 with . Then one has

for all and .

*Proof:* By a limiting argument it suffices to show the claim for , thus we need to show

By another limiting argument we can take to be an solution. By the monotone convergence theorem (and Sobolev embedding), it suffices to show that

whenever is a test function that vanishes in a neighbourhood of the origin . Note that as and all its derivatives are in on for every , it is Lipschitz in space and time, which among other things implies that the level sets are compact for every , and so is smooth and compactly supported in . We may therefore may differentiate under the integral sign to obtain

where we omit explicit dependence on for brevity. By Exercise 17(i), the right-hand side is

which one can write as a total derivative

which vanishes thanks to integration by parts and the divergence-free nature of . The claim follows.

The above proposition shows that in two dimensions, is constant, and so the integral cannot diverge for finite . Applying Theorem 19, we obtain Theorem 23.

One can adapt this argument to the Navier-Stokes equations:

**Exercise 25** Let be an integer, let , let be divergence-free, and let , be a maximal Cauchy development to the Navier-Stokes equations with initial data . Let be the vorticity.

- (i) Establish the vorticity equation .
- (ii) Show that for all and . (Note: to adapt the proof of Proposition 12, one should restrict attention to functions that are convex on the range of on, say, . The case of this inequality can also be established using the maximum principle for parabolic equations.)
- (iii) Show that .

**Remark 26** If solve the Euler equations on some time interval with initial data , then the time-reversed fields solve the Euler equations on the reflected interval with initial data . Because of this time reversal symmetry, the local and global well-posedness theory for the Euler equations can also be extended backwards in time; for instance, in two dimensions any divergence free initial data leads to an solution to the Euler equations on the whole time interval . However, the Navier-Stokes equations are very much *not* time-reversible in this fashion.

### Monte Verità registration open

The registration for the Winter School on trace functions at Monte Verità can now begin. As explained on the web page, because the number of participants is limited, you should send an email to this address to indicate your interest. For the most part, the school is for PhD students, so please indicate who is your PhD advisor, and if you are a postdoc, your motivation for attending the school.

We will then send to the selected participants the link to the official registration page.

### Postdocs at ETH

Like every year, the mathematics department of ETH offers some postdoc positions. This year, a slightly different organization has been chosen, combining some resources with the FIM. The positions are now called “Hermann Weyl Instructors”, and the main change (besides slightly earlier deadlines) is that the teaching duties are clearly stated upfront: the postdoc should teach 50% during two semesters of the three year position. (So if we compare with the Veblen Instructorships offered by Princeton and the IAS, for example, we request one year teaching/two year research, instead of two year teaching/one year research for the Veblen position).

The web page with information on the positions (including salary, which follows a standard ETH scale) is available on the FIM website. The deadline for application is November 1 (for full consideration — slightly later applications are permitted, but depending on the schedule of the committee meetings, they might be too late). Finally, the application can be done using this form.

### 254A, Notes 2: Weak solutions of the Navier-Stokes equations

In the previous set of notes we developed a theory of “strong” solutions to the Navier-Stokes equations. This theory, based around viewing the Navier-Stokes equations as a perturbation of the linear heat equation, has many attractive features: solutions exist locally, are unique, depend continuously on the initial data, have a high degree of regularity, can be continued in time as long as a sufficiently high regularity norm is under control, and tend to enjoy the same sort of conservation laws that classical solutions do. However, it is a major open problem as to whether these solutions can be extended to be (forward) global in time, because the norms that we know how to control globally in time do not have high enough regularity to be useful for continuing the solution. Also, the theory becomes degenerate in the inviscid limit .

However, it is possible to construct “weak” solutions which lack many of the desirable features of strong solutions (notably, uniqueness, propagation of regularity, and conservation laws) but can often be constructed globally in time even when one us unable to do so for strong solutions. Broadly speaking, one usually constructs weak solutions by some sort of “compactness method”, which can generally be described as follows.

- Construct a sequence of “approximate solutions” to the desired equation, for instance by developing a well-posedness theory for some “regularised” approximation to the original equation. (This theory often follows similar lines to those in the previous set of notes, for instance using such tools as the contraction mapping theorem to construct the approximate solutions.)
- Establish some
*uniform*bounds (over appropriate time intervals) on these approximate solutions, even in the limit as an approximation parameter is sent to zero. (Uniformity is key;*non-uniform*bounds are often easy to obtain if one puts enough “mollification”, “hyper-dissipation”, or “discretisation” in the approximating equation.) - Use some sort of “weak compactness” (e.g., the Banach-Alaoglu theorem, the Arzela-Ascoli theorem, or the Rellich compactness theorem) to extract a subsequence of approximate solutions that converge (in a topology weaker than that associated to the available uniform bounds) to a limit. (Note that there is no reason
*a priori*to expect such limit points to be unique, or to have any regularity properties beyond that implied by the available uniform bounds..) - Show that this limit solves the original equation in a suitable weak sense.

The quality of these weak solutions is very much determined by the type of uniform bounds one can obtain on the approximate solution; the stronger these bounds are, the more properties one can obtain on these weak solutions. For instance, if the approximate solutions enjoy an energy identity leading to uniform energy bounds, then (by using tools such as Fatou’s lemma) one tends to obtain energy *inequalities* for the resulting weak solution; but if one somehow is able to obtain uniform bounds in a higher regularity norm than the energy then one can often recover the full energy *identity*. If the uniform bounds are at the regularity level needed to obtain well-posedness, then one generally expects to upgrade the weak solution to a strong solution. (This phenomenon is often formalised through *weak-strong uniqueness* theorems, which we will discuss later in these notes.) Thus we see that as far as attacking global regularity is concerned, both the theory of strong solutions and the theory of weak solutions encounter essentially the same obstacle, namely the inability to obtain uniform bounds on (exact or approximate) solutions at high regularities (and at arbitrary times).

For simplicity, we will focus our discussion in this notes on finite energy weak solutions on . There is a completely analogous theory for periodic weak solutions on (or equivalently, weak solutions on the torus which we will leave to the interested reader.

In recent years, a completely different way to construct weak solutions to the Navier-Stokes or Euler equations has been developed that are not based on the above compactness methods, but instead based on techniques of convex integration. These will be discussed in a later set of notes.

** — 1. A brief review of some aspects of distribution theory — **

We have already been using the concept of a distribution in previous notes, but we will rely more heavily on this theory in this set of notes, so we pause to review some key aspects of the theory. A more comprehensive discussion of distributions may be found in this previous blog post. To avoid some minor subtleties involving complex conjugation that are not relevant for this post, we will restrict attention to real-valued (scalar) distributions here. (One can then define vector-valued distributions (taking values in a finite-dimensional vector space) as a vector of scalar-valued distributions.)

Let us work in some non-empty open subset of a Euclidean space (which may eventually correspond to space, time, or spacetime). We recall that is the space of (real-valued) test functions . It has a rather subtle topological structure (see previous notes) which we will not detail here. A (real-valued) distribution on is a continuous linear functional from test functions to the reals . (This pairing may also be denoted or in other texts.) There are two basic examples of distributions to keep in mind:

- Any locally integrable function gives rise to a distribution (which by abuse of notation we also call ) by the formula .
- Any Radon measure gives rise to a distribution (which we will again call ) by the formula . For instance, if , the Dirac mass at is a distribution with .

Two distributions are equal in the sense of distributions of for all . For instance, it is not difficult to show that two locally integrable functions are equal in the sense of distributions if and only if they agree almost everywhere, and two Radon measures are equal in the sense of distributions if and only if they are identical.

As a general principle, any “linear” operation that makes sense for “nice” functions (such as test functions) can also be defined for distributions, but any “nonlinear” operation is unlikely to be usefully defined for arbitrary distributions (though it may still be a good concept to use for distributions with additional regularity). For instance, one can take a partial derivative (known as the weak derivative) of any distribution by the definition

for all . Note that this definition agrees with the “strong” or “classical” notion of a derivative when is a smooth function, thanks to integration by parts. Similarly, if is smooth, one can define the product distribution by the formula

for all . One can also take linear combinations of two distributions in the usual fashion, thus

for all and .

**Exercise 1** Let be a connected open subset of . Let be a distribution on such that in the sense of distributions for all . Show that is a constant, that is to say there exists such that in the sense of distributions.

A sequence of distributions is said to converge in the weak-* sense or *converge in the sense of distributions* to another distribution if one has

as for every test function ; in this case we write . This notion of convergence is sometimes referred to also as weak convergence (and one writes instead of ), although there is a subtle distinction between weak and weak-* convergence in non-reflexive spaces and so I will try to avoid this terminology (though in many cases one will be working in a reflexive space in which there is no distinction).

The linear operations alluded to above tend to be continuous in the distributional sense. For instance, it is easy to see that if , then for all , and for any smooth ; similarly, if , , and , are sequences of real numbers, then .

Suppose that one places a norm or seminorm on . Then one can define a subspace of the space of distributions, defined to be the space of all distributions for which the norm

is finite. For instance, if is the norm for some , then is just the dual space (with the (equivalence classes of) locally integrable functions in identified with distributions as above).

We have the following version of the Banach-Alaoglu theorem which allows us to easily create sequences that converge in the sense of distributions:

**Proposition 2 (Variant of Banach-Alaoglu)** Suppose that is a norm or seminorm on which makes the space separable. Let be a bounded sequence in . Then there is a subsequence of the which converges in the sense of distributions to a limit .

*Proof:* By hypothesis, there is a constant such that

for all . For each given , we may thus pass to a subsequence of such that converges to a limit. Passing to a subsequence a countably infinite number of times and using the Arzelá-Ascoli diagonalisation trick, we can thus find a dense subset of (using the metric) and a subsequence of the such that the limit exists for every , and hence for every by a limiting argument and (1). If one then defines to be the function

then one can verify that is a distribution, and by (1) we will have . By construction, converges in the sense of distributions to , and we are done.

It is important to note that there is no uniqueness claimed for ; while any given subsequence of the can have at most one limit , it is certainly possible for different subsequences to converge to different limits. Also, the proposition only applies for spaces that have preduals ; this covers many popular function spaces, such as spaces for , but omits endpoint spaces such as or . (For instance, approximations to the identity are uniformly bounded in , but converge weakly to a Dirac mass, which lies outside of .)

From definition we see that if , then we have the Fatou-type lemma

Thus, upper bounds on the approximating distributions are usually inherited by their limit . However, it is essential to be aware that the same is not true for lower bounds; there can be “loss of mass” in the limit. The following four examples illustrate some key ways in which this can occur:

- (Escape to spatial infinity) If is a non-zero test function, and is a sequence in going to infinity, then the translations of converge in the sense of distributions to zero, even though they will not go to zero in many function space norms (such as ).
- (Escape to frequency infinity) If is a non-zero test function, and is a sequence in going to infinity, then the modulations of converge in the sense of distributions to zero (cf. the Riemann-Lebesgue lemma), even though they will not go to zero in many function space norms (such as ).
- (Escape to infinitely fine scales) If , is a sequence of positive reals going to infinity, and , then the sequence converges in the sense of distributions to zero, but will not go to zero in several function space norms (e.g. with ).
- (Escape to infinitely coarse scales) If , is a sequence of positive reals going to zero, and , then the sequence converges in the sense of distributions to zero, but will not go to zero in several function space norms (e.g. with ).

Related to this loss of mass phenomenon is the important fact that the operation of pointwise multiplication is generally *not* continuous in the distributional topology: and does *not* necessarily imply in general (in fact in many cases the products or might not even be well-defined). For instance:

- Using the escape to frequency infinity example, the functions converge in the sense of distributions to zero, but their squares instead converge in the sense of distributions to , as can be seen from the double angle formula .
- Using the escape to infinitely fine scales example, the functions converge in the sense of distributions to zero, but their squares will not if .

This lack of continuity of multiplication means that one has to take a non-trivial amount of care when applying the theory of distributions to nonlinear PDE; a sufficiently careless regard for this issue (or more generally, treating distribution theory as some sort of “magic wand“) is likely to lead to serious errors in one’s arguments.

One way to recover continuity of pointwise multiplication is to somehow upgrade distributional convergence to stronger notions of convergence. For instance, from Hölder’s inequality one sees that if converges strongly to in (thus and both lie in , and goes to zero), and converges strongly to in , then will converge strongly in to , where .

One key way to obtain strong convergence in some norm is to obtain uniform bounds in an even stronger norm – so strong that the associated space embeds compactly in the space associated to the original norm. More precisely

**Proposition 3 (Upgrading to strong convergence)** Let be two norms on , with associated spaces of distributions. Suppose that embeds compactly into , that is to say the closed unit ball in is a compact subset of . If is a bounded sequence in that converges in the sense of distributions to a limit , then converges strongly in to as well.

*Proof:* By the Urysohn subsequence principle, it suffices to show that every subsequence of has a further subsequence that converges strongly in to . But by the compact embedding of into , every subsequence of has a further subsequence that converges strongly in to some limit , and hence also in the sense of distributions to by definition of the norm. But thus subsequence also converges in the sense of distributions to , and hence , and the claim follows.

** — 2. Simple examples of weak solutions — **

We now study weak solutions for some very simple equations, as a warmup for discussing weak solutions for Navier-Stokes.

We begin with an extremely simple initial value problem, the ODE

on a half-open time interval with , with initial condition , where and given and is the unknown. Of course, when are smooth, then the fundamental theorem of calculus gives the unique solution

for . If one integrates the identity against a test function (that is to say, one multiplies both sides of this identity by and then integrates) on , one obtains

which upon integration by parts and rearranging gives

where we extend by zero to the open set . Thus, we have

in the sense of distributions (on ). More generally, if are locally integrable functions on , we say that is a *weak solution* to the initial value problem if (4) holds in the sense of distributions on . Thanks to the fundamental theorem of calculus for locally integrable functions, we still recover the unique solution (16):

**Exercise 4** Let be locally integrable functions (extended by zero to all of ), and let . Show that the following are equivalent:

- (i) is a weak solution to the initial value problem in the sense that (4) holds in the sense of distributions on .
- (ii) One has (16) for almost all .

Now let be a finite dimensional vector space, let be a continuous function, let , and consider the initial value problem

on some forward time interval . The Picard existence theorem lets us construct such solutions when is Lipschitz continuous and is small enough, but now we are merely requiring to be continuous and not necessarily Lipschitz. As in the preceding case, we introduce the notion of a weak solution. If is locally bounded (and measurable) on , then will be locally integrable on ; we then extend by zero to be distributions on , and we say that is a *weak solution* to (5) if one has

in the sense of distributions on , or equivalently that one has the identity

for all test functions compactly supported in . In this simple ODE setting, the notion of a weak solution coincides with stronger notions of solutions:

**Exercise 5** Let be finite dimensional, let be continuous, let , and let be locally bounded and measurable. Show that the following are equivalent:

- (i) (Weak solution) is a weak solution to (5) on .
- (ii) (Mild solution) After modification on a set of measure zero, is continuous and for all .
- (iii) (Classical solution) After modification on a set of measure zero, is continuously differentiable and obeys (5) for all .

In particular, if the ODE initial value problem (5) exhibits finite time blowup for its (unique) classical solution, then it will also do so for weak solutions (with exactly the same blouwp time). This will be in contrast with the situation for PDE, in which it is possible for weak solutions to persist beyond the time in which classical solutions exist.

Now we give a compactness argument to produce weak solutions (which will then be classical solutions, by the above exercise):

**Proposition 6 (Weak existence)** Let be a finite dimensional vector space, let , let , and let be a continuous function. Let be the time

Then there exists a continuously differentiable solution to the initial value problem (5) on .

*Proof:* By construction, we have

Using the Weierstrass approximation theorem (or Stone-Weierstrass theorem), we can express on as the uniform limit of Lipschitz continuous functions , such that

for all ; we can then extend in a Lipschitz continuous fashion to all of . (The Lipschitz constant of is permitted to diverge to infinity as ). We can then apply the Picard existence theorem (Theorem 8 of Notes 1), for each we have a (continuously differentiable) maximal Cauchy development of the initial value problem

with as if is finite. (We could also solve the ODE backwards in time, but will not need to do so here.) We now claim that , and furthermore that one has the uniform bound

for all and all . Indeed, if this were not the case then by continuity (and the fact that ) there would be some and some such that , and for all . But then by the fundamental theorem of calculus and the triangle inequality (and (6)) we have

a contradiction. Thus we have (8) for all and , so takes values in on . Applying (7), (6) we conclude that

for all and all ; in particular, the are uniformly Lipschitz continuous and uniformly bounded on . Applying the Arzelá-Ascoli theorem, we can then pass to a subsequence in which the converge uniformly on to a limit , which then also takes values in . (Alternatively, one could use Proposition 2 to have converge in the sense of distributions, followed by Proposition 3 to upgrade to uniform convergence.) As converges uniformly to on , we conclude that converges uniformly to on . Since we have

in the sense of distributions (extending , by zero to ), we can take distributional limits and conclude that

in the sense of distributions, which by Exercise 5 shows that is a continuously differentiable solution to the initial value problem (5) as required.

In contrast to the Picard theory when is Lipschitz, Proposition 6 does not assert any uniqueness of the solution to the initial value problem (5). And in fact uniqueness often fails once the Lipschitz hypothesis is dropped! Consider the simple example of the scalar initial value problem

on , so the nonlinearity here is the continuous, but not Lipschitz continuous, function . Clearly the zero function is a solution to this ODE. But so is the function . In fact there are a continuum of solutions: for any , the function is a solution. Proposition 6 will select one of these solutions, but the precise solution selected will depend on the choice of approximating functions :

**Exercise 7** Let . For each , let denote the function

- (i) Show that each is Lipschitz continuous, and the converge uniformly to the function as .
- (ii) Show that the solution to the initial value problem is given by
for and

for .

- (iii) Show that as , converges uniformly to the function .

Now we give a simple example of a weak solution construction for a PDE, namely the linear transport equation

where the initial data and a position-dependent velocity field is given, and is the unknown field.

Suppose for the moment that are smooth, with bounded. Then one can solve this problem using the method of characteristics. For any , let denote the solution to the initial value problem

The Picard existence theorem gives us a smooth maximal Cauchy development for this problem; as is bounded, this development cannot go to infinity in finite time (either forward or backwards in time), and so the solution is global. Thus we have a well-defined map for each time . In fact we can say more:

**Exercise 8** Let the assumptions be as above.

- (i) Show the semigroup property for all .
- (ii) Show that is a homeomorphism for each .
- (iii) Show that for every , is differentiable, and the derivative obeys the linear initial value problem
(Hint: while this system formally can be obtained by differentiating (10) in , this formal differentiation requires rigorous justification. One can for instance proceed by first principles, showing that the Newton quotients approximately obey this equation, and then using a Gronwall inequality argument to compare this approximate solution to an exact solution.)

- (iv) Show that is a diffeomorphism for each ; that is to say, and its inverse are both continuously differentiable.
- (v) Show that is a smooth diffeomorphism (that is to say and its inverse are both smooth). (Caution: one may require a bit of planning to avoid the proof from becoming extremely long and tedious.)

From (10) and the chain rule we have the identity

for any smooth function (cf. the material derivative used in Notes 0). Thus, one can rewrite the initial value problem (9) as

at which point it is clear that the unique smooth solution to the initial value problem (10) is given by

Among other things, this shows that the sup norm is a conserved quantity:

Now we drop the hypothesis that is bounded. One can no longer assume that the trajectories are globally defined, or even that they are defined for a positive time independent of the starting point . Nevertheless, we have

**Proposition 9 (Weak existence)** Let be smooth, and let be smooth and bounded. Then there exists a bounded measurable function which weakly solves (10) in the sense that

in the sense of distributions on ) (extending by zero outside of ), or equivalently that

*Proof:* By multiplying by appropriate smooth cutoff functions, we can express as the locally uniform limit of smooth bounded functions with equal to on (say) . By the preceding discussion, for each we have a smooth global solution to the initial value problem

in the sense of distributions on . By (11), the are uniformly bounded with

Thus, by Proposition 2, we can pass to a subsequence and assume that converges in the sense of distributions to an element on ; by (2) we have

Since the are all supported on , is also. Taking weak limits in (13) (multiplying first by a cutoff function to localise to

This gives the required weak solution.

The following exercise shows that while one can construct global weak solutions, there is significant failure of uniqueness and persistence of regularity:

**Exercise 10** Set , thus we are solving the ODE

- (i) If are bounded measurable functions, show that the function defined by
for and

for is a weak solution to (14) with initial data

for and

for . (Note that one does not need to specify these functions at , since this describes a measure zero set.)

- (ii) Suppose further that , and that is smooth and compactly supported in . Show that the weak solution described in (i) is the solution constructed by Proposition 9.
- (iii) Show that there exist at least two bounded measurable weak solutions to (14) with initial data , thus showing that weak solutions are not unique. (Of course, at most one of these solutions could obey the inequality (12), so there are some weak solutions that are not constructible using Proposition 9.) Show that this lack of uniqueness persists even if one also demands that the weak solutions be smooth; conversely, show that there exist weak solutions with initial data that are discontinuous.

**Remark 11** As the above example illustrates, the loss of mass phenomenon for weak solutions arises because the approximants to those weak solutions “escape to infinity”in the limit, similarly, the loss of uniqueness phenomenon for weak solutions arises because the approximants “come from infinity” in the limit. In this particular case of a transport equation, the infinity is spatial infinity, but for other types of PDE it can be possible for approximate solutions to escape from, or come from, other types of infinity, such as frequency infinity, fine scale infinity, or coarse scale infinity. (In the former two cases, the loss of mass phenomenon will also be closely related to a loss of regularity in the weak solution.) Eliminating these types of “bad behaviour” for weak solutions is morally equivalent to obtaining uniform bounds for the approximating solutions that are strong enough to prevent such solutions from having a significant presence near infinity; in the case of Navier-Stokes, this basically corresponds to controlling such solutions uniformly in subcritical or critical norms.

** — 3. Leray-Hopf weak solutions — **

We now adapt the above formalism to construct weak solutions to the Navier-Stokes equations, following the fundamental work of Leray, who constructed such solutions on , (as before, we discard the case as being degenerate). The later work of Hopf extended this construction to other domains, but we will work solely with here for simplicity.

In the previous set of notes, several formulations of the Navier-Stokes equations were considered. For smooth solutions (with suitable decay at infinity, and in some cases a normalisation hypothesis on the pressure also), these formulations were shown to be essentially equivalent to each other. But at the very low level of regularity that weak solutions are known to have, these different formulations of Navier-Stokes are no longer obviously equivalent. As such, there is not a single notion of a “weak solution to the Navier-Stokes equations”; the notion depends on which formulation of these equations one chooses to work with. This leads to a number of rather technical subtleties when developing a theory of weak solutions. We will largely avoid these issues here, focusing on a specific type of weak solution that arises from our version of Leray’s construction.

It will be convenient to work with the formulation

of the initial value problem for the Navier-Stokes equations. Writing out the divergence as and interchanging with , we can rewrite this as

The point of this formulation is that it can be interpreted distributionally with fairly weak regularity hypotheses on . For Leray’s construction, it turns out that a natural regularity class is

basically because the norms associated to these function spaces are precisely the quantities that will be controlled by the important *energy identity* that we will discuss later. With this regularity, we have in particular that

by which we mean that

for all . Next, we need a special case of the Sobolev embedding theorem:

**Exercise 12 (Non-endpoint Sobolev embedding theorem)** Let be such that . Show that for any , one has with

(*Hint:* this non-endpoint case can be proven using the Littlewood-Paley projections from the previous set of notes.) The endpoint case of the Sobolev embedding theorem is also true (as long as ), but the proof requires the Hardy-Littlewood-Sobolev fractional integration inequality, which we will not cover here; see for instance these previous lecture notes.

We conclude that there is some for which

and hence by Hölder’s inequality

for all . (The precise value of is not terribly important for our arguments.)

Next, we invoke the following result from harmonic analysis:

**Proposition 13 (Boundedness of the Leray projection)** For any , one has the bound

for all . In particular, has a unique continuous extension to a linear map from to itself.

For , this proposition follows easily from Plancherel’s theorem. For , the proposition is more non-trivial, and is usually proven using the Calderón-Zygmund theory of singular integrals. A proof can be found for instance in Stein’s “Singular integrals“; we shall simply assume it as a black box here. We conclude that for in the regularity class (16), we have

In particular, is locally integrable in spacetime and thus can be interpreted as a distribution on (after extending by zero outside of . Thus also can be interpreted as a distribution. Similarly for the other two terms in (15). We then say that a function in the regularity class (16) is a *weak solution* to the initial value problem (15) for some distribution if one has

in the sense of spacetime distributions on (after extending by zero outside of . Unpacking the definitions of distributional derivative, this is equivalent to requiring that

for all spacetime test functions .

We can now state a form of Leray’s theorem:

**Theorem 14 (Leray’s weak solutions)** Let be divergence free (in the sense of distributions), and let . Then there exists a weak solution to the initial value problem (15). Furthermore, obeys the energy inequality

for almost every .

We now prove this theorem using the same sort of scheme that was used previously to construct weak solutions to other equations. We first need to set up some approximate solutions to (15). There are many ways to do this – the traditional way being to use some variant of the Galerkin method – but we will proceed using the Littlewood-Paley projections that were already introduced in the previous set of notes. Let be a sequence of dyadic integers going to infinity. We consider solutions to the initial value problem

this is (15) except with some additional factors of inserted in the initial data and in the nonlinear term. Formally, in the limit , the factors should converge to the identity and one should recover (15); but this requires rigorous justification. The number of factors of in the nonlinear term may seem excessive, but as we shall see, this turns out to be a convenient choice as it will lead to a favourable energy inequality for these solutions.

The Fujita-Kato theory of mild solutions for (15) from the previous set of notes can be easily adapted to the initial value problem (19), because the projections are bounded on all the function spaces of interest. Thus, for any , and any divergence-free , we can define an -mild solution to (15) on a time interval to be a function in the function space

such that

(in the sense of distributions) for all ; a mild solution on is a solution that is an mild solution when restricted to every compact subinterval . Note that the frequency-localised initial data lies in every space. By a modification of the theory of the previous set of notes, we thus see that there is a maximal Cauchy development that is a smooth solution to (19) (and an mild solution for every ), with if . Note that as is divergence-free, , and preserves the divergence-free property, and projects to divergence-free functions, is divergence-free for all . Similarly, as projects to functions with Fourier transform supported on the ball in , and this property is preserved by , , and we see that also has Fourier transform supported on the ball . This (non-uniformly) bounded frequency support is the key additional feature enjoyed by our approximate solutions that has no analogue for the actual solution , and effectively serves as a sort of “discretisation” of the problem (as per the uncertainty principle).

The next step is to ensure that the approximate solutions exist globally in time, that is to say that . We can do this by exploiting the energy conservation law for this equation. Indeed for any time , define the energy

(compare with Exercise 4 from Notes 0). From (19) we know that and lie in for any and any . This very high regularity allows us to easily justify operations such as integration by parts or differentiation under the integral sign in what follows. In particular, it is easy to establish the identity

for any . Inserting (19) (and suppressing explicit dependence on for brevity), we obtain

For the second term, we integrate by parts to obtain

For the first term

we use the self-adjointness of and , the skew-adjointness of , the fact that all three of these operators (being Fourier multipliers) commute with each other to write it as

Since is divergence-free, the Leray projection acts as the identity on it, so we may write the above expression as

Recalling the rules of thumb for the energy method from the previous set of notes, we locate a total derivative to rewrite the preceding expression as

(It is here that we begin to see how important it was to have so many factors of in our approximating equation.) We may now integrate by parts (easily justified using the high regularity of ) to obtain

But is divergence-free, so vanishes. To summarise, we conclude the (differential form of) the *energy identity*

by the fundamental theorem of calculus, we conclude in particular that

for all . Among other things, this gives a uniform bound

Ordinarily, this type bound would be too weak to combine with the blowup criterion mentioned earlier. But we know that has Fourier transform supported in , so in particular we have the reproducing formula . We may thus use the Bernstein inequality (Exercise 52 from Notes 1) and conclude that

This bound is not uniform in , but it is still finite, and so by combining with the blowup criterion we conclude that .

Now we need to start taking limits as . For this we need uniform bounds. Returning to the energy identity (20), we have the uniform bounds

so in particular for any finite one has

This is enough regularity for Proposition 2 to apply, and we can pass to a subsequence of which converges in the sense of spacetime distributions in (after extending by zero outside of to a limit , which is in for every .

Now we work on verifying the energy inequality (18). Let be a test function with which is non-increasing on . From (20) and integration by parts we have

Taking limit inferior and using the Fatou-type lemma (2), we conclude that

Now let , take to equal on and zero outside of for some small . Then we have

The function is supported on , is non-negative, and has total mass one. By the Lebesgue differentiation theorem applied to the bounded measurable function , we conclude that for almost every , we have

as . The claim (18) follows.

It remains to show that is a weak solution of (15), that is to say that (17) holds in the sense of spacetime distributions. Certainly the smooth solution of (19) will also be a weak solution, thus

in the sense of spacetime distributions on , where we extend by zero outside of .

At this point it is tempting to just take distributional limits of both sides of (22) to obtain (17). Certainly we have the expected convergence for the linear components of the equation:

However, it is not immediately clear that

mainly because of the previously mentioned problem that multiplication is not continuous with respect to weak notions of convergence. But if we can show (23), then we do indeed recover (17) as the limit of (22), which will complete the proof of Theorem 14.

Let’s try to simplify the task of proving (23). The partial derivative operator is continuous with respect to convergence in distributions, so it suffices to show that

where

We now try to get rid of the outer Littlewood-Paley projection. We claim that

Let be a fixed time. By Sobolev embedding and (21), is bounded in , uniformly in , for some . The same is then true for , hence by Hölder’s inequality and Proposition 13, is uniformly bounded in . On the other hand, for any spacetime test function , it is not difficult (using the rapid decrease of the Fourier transform of ) to show that goes to zero in the dual space . This gives (24).

It thus suffices to show that converges in the sense of distributions to , thus one wants

for any spacetime test function . One can easily calculate that lies in the dual space to the space that and are bounded in, so it will suffices to show that converges strongly in to for sufficiently close to . and any compact subset of spacetime (since the norm of outside of can be made arbitrarily small by making large enough.)

Let be a dyadic integer, then we can split

The functions are uniformly bounded in by some bound , hence by Plancherel’s theorem the functions , have an norm of (assuming is large enough so that ). Indeed, by Littlewood-Paley decomposition and Bernstein’s inequality we also see that these functions have an norm of if is close enough to that the exponent of is negative. It will therefore suffice to show that

strongly in for every fixed and .

We already know that goes to zero in the sense of distributions, so (as Proposition 3 indicates) the main difficulty is to obtain compactness of the sequence. The operator localises in spatial frequency, and the restriction to localises in both space and time, however there is still the possibility of escaping to temporal frequency. To prevent this, we need some sort of equicontinuity in time. For this, we may turn to the equation (19) obeyed by . Applying , we see that

when is large enough. We have already seen that is bounded in uniformly in , so by the Bernstein inequality is bounded in (we allow the bound to depend on ). Similarly for . We conclude that is bounded in uniformly in ; taking weak limits using (2), the same is true for , and hence is bounded in . Also, is bounded in by Bernstein’s inequality; thus is equicontinuous in . By the Arzelá-Ascoli theorem and Proposition 3, must therefore go to zero uniformly, and the claim follows. This completes the proof of Theorem 14.

**Exercise 15 (Rellich compactness theorem)** Let be such that .

- (i) Show that if is a bounded sequence in that converges in the sense of distributions to a limit , then there is a subsequence which converges strongly in to (thus, for any compact set , the restrictions of to converge strongly in to the restriction of to ).
- (ii) Show that for any compact set , the linear map defined by setting to be the restriction of to is a compact linear map.
- (iii) Show that the above two claims fail at the endpoint (which of course only occurs when ).

The weak solutions constructed by Theorem 14 have additional properties beyond the ones listed in the above theorem. For instance:

**Exercise 16** Let be as in Theorem 14, and let be a weak solution constructed using the proof of Theorem 14.

- (i) Show that is divergence-free in the sense of spacetime distributions.
- (ii) Show that there is a measure zero subset of such that one has the energy inequality
for all with . Furthermore, show that for all , the time-shifted function defined by is a weak solution to the initial value problem (15) with initial data .

- (iii) Show that after modifying on a set of measure zero, the function is continuous for any . (
*Hint:*first establish this when is a test function.)

We will discuss some further properties of the Leray weak solutions in later notes.

** — 4. Weak-strong uniqueness — **

If is a (non-zero) element in a Hilbert space , and is another element obeying the inequality

then this is very far from the assertion that is equal to , since the ball of elements of obeying (25) is far larger than the single point . However, if one also posseses the information that agrees with when tested against , in the sense that

then (25) and (26) combine to indeed be able to conclude that . Geometrically, this is because the above-mentioned ball is tangent to the hyperplane described by (26) at the point . Algebraically, one can establish this claim by the cosine rule computation

giving the claim.

This basic argument has many variants. Here are two of them:

**Exercise 17 (Weak convergence plus norm bound equals strong convergence (Hilbert spaces))** Let be an element of a Hilbert space , and let be a sequence in which weakly converges to , that is to say that for all . Show that the following are equivalent:

- (i) .
- (ii) .
- (iii) converges
*strongly*to .

**Exercise 18 (Weak convergence plus norm bound equals strong convergence ( norms))** Let be a measure space, let be an absolutely integrable non-negative function, and let be a sequence of absolutely integrable non-negative functions that converge pointwise to . Show that the following are equivalent:

- (i) .
- (ii) .
- (iii) converges strongly in to .

(*Hint:* express and in terms of the positive and negative parts of . The latter can be controlled using the dominated convergence theorem.)

**Exercise 19** Let be as in Theorem 14, and let be a weak solution constructed using the proof of Theorem 14. Show that (after modifying on a set of measure zero if necessary), converges strongly in to as . (*Hint:* use Exercise 16(iii) and Exercise 17.)

Now we give a variant relating to weak and strong solutions of the Navier-Stokes equations.

**Proposition 20 (Weak-strong uniqueness)** Let be an mild solution to the Navier-Stokes equations (15) for some , , and with . Let be a weak solution to the Navier-Stokes equation which obeys the energy inequality (18) for almost all . Then and agree almost everywhere on .

Roughly speaking, this proposition asserts that weak solutions obeying the energy inequality stay unique as long as a strong solution exists (in particular, it is unique whenever it is regular enough to be a strong solution). However, once a strong solution reaches the end of its maximal Cauchy development, there is no further guarantee of uniqueness for the rest of the weak solution. Also, there is no guarantee of uniqueness of weak solutions if the energy inequality is dropped, and indeed there is now increasing evidence that uniqueness is simply false in this case; see for instance this paper of Buckmaster and Vicol for recent work in this direction. The conditions on can be relaxed somewhat (in particular, it is possible to drop the condition ), though they still need to be “subcritical” or “critical” in nature; see for instance the classic papers of Prodi, of Serrin, and of Ladyzhenskaya, which show that weak solutions on obeying the energy inequality are necessarily unique and smooth (after time ) if they lie in the space for some exponents with and ; the endpoint case was worked out more recently by Escauriaza, Seregin, and Sverak. For a recent survey of weak-strong uniqueness results for fluid equations, see this paper of Wiedemann.

*Proof:* Before we give the formal proof, let us first give a non-rigorous proof in which we pretend that the weak solution can be manipulated like a strong solution. Then we have

and

As in the beginning of the section, the idea is to analyse the norm of the difference . Writing in the first equation and subtracting from the second equation, we obtain the *difference equation*

If we formally differentiate the energy using this equation, we obtain

(omitting the explicit dependence of the integrand on and ) which after some integration by parts (noting that is divergence-free and thus is the identity on formally becomes

The and terms formally cancel out by the usual trick of writing as a total derivative and integrating by parts, using the divergence-free nature of both and . For the term , we can cancel it against the term by the arithmetic mean-geometric mean inequality

to obtain

thanks to Hölder’s inequality. As is an mild solution, it lies in , which by Sobolev embedding and Hölder means that it is also in . Since , Gronwall’s inequality then should give for all , giving the claim.

Now we begin the rigorous proof, in which is only known to be a weak solution. Here, we do not directly manipulate the difference equation, but instead carefully use the equations for and as a substitute. Define and as before. From the cosine rule we have

where we drop the explicit dependence on in the integrand. From the energy inequality hypothesis (18), we have

for almost all , where we also drop explicit dependence on in the integrand. The strong solution also obeys the energy inequality; in fact we have the energy equality

as can be seen by first working with smooth solutions and taking limits using the local well-posedness theory. We conclude that

Now we work on the integral . Because we only know to solve the equation

in the sense of spacetime integrals, it is difficult to directly treat this spatial integral. Instead (similarly to the proof of the energy inequality for Leray solutions), we will first work with a proxy

where is a test function in time, which we normalise with ; eventually we will make an approximation to the indicator function of and apply the Lebesgue differentiation theorem to recover information about for almost every .

By hypothesis, we have

for any spacetime test function . We would like to apply this identity with replaced by (in order to obtain an identity involving the expression (28)). Now is not a test function; however, as is an mild solution, it has the regularity

also, using the equation (15), Sobolev embedding, Hölder’s inequality, and the hypotheses and we see that

(If one wishes, one can first obtain this bound for smooth solutions, and take limits using the local well-posedness theory.) As a consequence, one can find a sequence of test functions , such that converges to in and norm (so converges to in norm), and converges to in norm. Since lies in , lies in , and lies in by Hölder and Sobolev, we can take limits and conclude that

Since is divergence-free, and does not depend on the spatial variables, we can simplify this slightly as

and so we can write (28) as

Using the Lebesgue differentiation theorem as in the proof of Theorem 14, we conclude that for almost every , one has the identity

Applying (15), the right-hand side is

(Note that expressions such as are well defined because lie in .) We can integrate by parts (justified using the usual limiting argument and the bounds on ) and use the divergence-free nature of to write this as

Inserting this into (27), we conclude that

We write and write this as

noting from the regularity , on and Sobolev embedding that one can ensure that all integrals here are absolutely convergent.

The integral can be rewritten using integration by parts as (noting that there is enough regularity to justify the integration by parts by the usual limiting argument); expressing as a total derivative and integrating by parts again using the divergence-free nature of , we see that this expression vanishes. Similarly for the term. Now we eliminate the remaining terms which are linear in :

We may integrate by parts, and write the dot product in coordinates, to write this as

Applying the Leibniz rule and the divergence-free nature of , we see that this expression vanishes. We conclude that

Now we use the Leibniz rule, the divergence-free nature of , and the arithmetic mean-geometric mean inequality to write

to obtain

and hence by Sobolev embedding we have

for almost all . Applying Gronwall’s inequality (modifying on a set of measure zero) we conclude that for almost all , giving the claim.

One application of weak-strong uniqueness results is to give (in the case at least) *partial regularity* on the weak solutions constructed by Leray, in that the solutions agree with smooth solutions on large regions of spacetime – large enough, in fact, to cover all but a measure zero set of times . Unfortunately, the complement of this measure zero set could be disconnected, and so one could have different smooth solutions agreeing with at different epochs, so this is still quite far from an assertion of global regularity of the solution. Nevertheless it is still a non-trivial and interesting result:

**Theorem 21 (Partial regularity)** Let . Let be as in Theorem 14, and let be a weak solution constructed using the proof of Theorem 14.

- (i) (Eventual regularity) There exists a time such that (after modification on a set of measure zero), the weak solution on agrees with an mild solution on with initial data (where we time shift the notion of a mild solution to start at instead of ).
- (ii) (Epochs of regularity) There exists a compact exceptional set of measure zero, such that for any time , there is a time interval containing in its interior such that on agrees almost everywhere whtn an mild solution on with initial data .

*Proof:* (Sketch) We begin with (i). From (18), the norm of and the norm of are finite. Thus, for any , one can find a positive measure set of times such that

which by Plancherel and Cauchy-Schwarz implies that

In particular, by Exercise 16, one can find a time such that is a weak solution on with initial data obeying the energy inequality, with

By the small data global existence theory (Theorem 45 from Notes 1), if is chosen small enough, then there is then a global mild solution on to the Navier-Stokes equations with initial data , which must then agree with by weak-strong uniqueness. This proves (i).

Now we look at (ii). In view of (i) we can work in a fixed compact interval . Let be a time, and let be a sufficiently small constant. If there is a positive measure set of times for which

then by the same argument as above (but now using well-posedness theory instead of well-posedness theory), we will be able to equate (almost everywhere) with an mild solution on for some neighbourhood of . Thus the only times for which we cannot do this are those for which one has

for almost all . In particular, for any , one can cover such times by a collection of intervals of length , such that for almost every in that interval. On the other hand, as is bounded in , the number of disjoint time intervals of this form is at most (where we allow the implied constant to depend on and ). Thus the set of exceptional times can be covered by intervals of length , and thus its closure has Lebesgue measure . Sending we see that the exceptional times are contained in a closed measure zero subset of , and the claim follows.

The above argument in fact shows that the exceptional set in part (ii) of the above theorem will have upper Minkowski dimension at most (and hence also Hausdorff dimension at most ). There is a significant strengthening of this partial regularity result due to Caffarelli, Kohn, and Nirenberg, which we will discuss in later notes.

### 254A, Notes 1: Local well-posedness of the Navier-Stokes equations

We now begin the rigorous theory of the incompressible Navier-Stokes equations

where is a given constant (the *kinematic viscosity*, or *viscosity* for short), is an unknown vector field (the *velocity field*), and is an unknown scalar field (the *pressure field*). Here is a time interval, usually of the form or . We will either be interested in spatially decaying situations, in which decays to zero as , or -periodic (or *periodic* for short) settings, in which one has for all . (One can also require the pressure to be periodic as well; this brings up a small subtlety in the uniqueness theory for these equations, which we will address later in this set of notes.) As is usual, we abuse notation by identifying a -periodic function on with a function on the torus .

In order for the system (1) to even make sense, one requires some level of regularity on the unknown fields ; this turns out to be a relatively important technical issue that will require some attention later in this set of notes, and we will end up transforming (1) into other forms that are more suitable for lower regularity candidate solution. Our focus here will be on local existence of these solutions in a short time interval or , for some . (One could in principle also consider solutions that extend to negative times, but it turns out that the equations are not time-reversible, and the forward evolution is significantly more natural to study than the backwards one.) The study of Euler equations, in which , will be deferred to subsequent lecture notes.

As the unknown fields involve a time parameter , and the first equation of (1) involves time derivatives of , the system (1) should be viewed as describing an evolution for the velocity field . (As we shall see later, the pressure is not really an independent dynamical field, as it can essentially be expressed in terms of the velocity field without requiring any differentiation or integration in time.) As such, the natural question to study for this system is the initial value problem, in which an initial velocity field is specified, and one wishes to locate a solution to the system (1) with initial condition

for . Of course, in order for this initial condition to be compatible with the second equation in (1), we need the compatibility condition

and one should also impose some regularity, decay, and/or periodicity hypotheses on in order to be compatible with corresponding level of regularity etc. on the solution .

The fundamental questions in the local theory of an evolution equation are that of *existence*, *uniqueness*, and *continuous dependence*. In the context of the Navier-Stokes equations, these questions can be phrased (somewhat broadly) as follows:

- (a) (Local existence) Given suitable initial data , does there exist a solution to the above initial value problem that exists for some time ? What can one say about the time of existence? How regular is the solution?
- (b) (Uniqueness) Is it possible to have two solutions of a certain regularity class to the same initial value problem on a common time interval ? To what extent does the answer to this question depend on the regularity assumed on one or both of the solutions? Does one need to normalise the solutions beforehand in order to obtain uniqueness?
- (c) (Continuous dependence on data) If one perturbs the initial conditions by a small amount, what happens to the solution and on the time of existence ? (This question tends to only be sensible once one has a reasonable uniqueness theory.)

The answers to these questions tend to be more complicated than a simple “Yes” or “No”, for instance they can depend on the precise regularity hypotheses one wishes to impose on the data and on the solution, and even on exactly how one interprets the concept of a “solution”. However, once one settles on such a set of hypotheses, it generally happens that one either gets a “strong” theory (in which one has existence, uniqueness, and continuous dependence on the data), a “weak” theory (in which one has existence of somewhat low-quality solutions, but with only limited uniqueness results (or even some spectacular failures of uniqueness) and almost no continuous dependence on data), or no satsfactory theory whatsoever. In the former case, we say (roughly speaking) that the initial value problem is *locally well-posed*, and one can then try to build upon the theory to explore more interesting topics such as global existence and asymptotics, classifying potential blowup, rigorous justification of conservation laws, and so forth. With a weak local theory, it becomes much more difficult to address these latter sorts of questions, and there are serious analytic pitfalls that one could fall into if one tries too strenuously to treat weak solutions as if they were strong. (For instance, conservation laws that are rigorously justified for strong, high-regularity solutions may well fail for weak, low-regularity ones.) Also, even if one is primarily interested in solutions at one level of regularity, the well-posedness theory at another level of regularity can be very helpful; for instance, if one is interested in smooth solutions in , it turns out that the well-posedness theory at the critical regularity of can be used to establish *globally* smooth solutions from small initial data. As such, it can become quite important to know what kind of local theory one can obtain for a given equation.

This set of notes will focus on the “strong” theory, in which a substantial amount of regularity is assumed in the initial data and solution, giving a satisfactory (albeit largely local-in-time) well-posedness theory. “Weak” solutions will be considered in later notes.

The Navier-Stokes equations are not the simplest of partial differential equations to study, in part because they are an amalgam of three more basic equations, which behave rather differently from each other (for instance the first equation is nonlinear, while the latter two are linear):

- (a)
*Transport equations*such as . - (b)
*Diffusion equations*(or*heat equations*) such as . - (c) Systems such as , , which (for want of a better name) we will call
*Leray systems*.

Accordingly, we will devote some time to getting some preliminary understanding of the linear diffusion and Leray systems before returning to the theory for the Navier-Stokes equation. Transport systems will be discussed further in subsequent notes; in this set of notes, we will instead focus on a more basic example of nonlinear equations, namely the first-order *ordinary differential equation*

where takes values in some finite-dimensional (real or complex) vector space on some time interval , and is a given linear or nonlinear function. (Here, we use “interval” to denote a connected non-empty subset of ; in particular, we allow intervals to be half-infinite or infinite, or to be open, closed, or half-open.) Fundamental results in this area include the Picard existence and uniqueness theorem, the Duhamel formula, and Grönwall’s inequality; they will serve as motivation for the approach to local well-posedness that we will adopt in this set of notes. (There are other ways to construct strong or weak solutions for Navier-Stokes and Euler equations, which we will discuss in later notes.)

A key role in our treatment here will be played by the fundamental theorem of calculus (in various forms and variations). Roughly speaking, this theorem, and its variants, allow us to recast differential equations (such as (1) or (4)) as integral equations. Such integral equations are less tractable algebraically than their differential counterparts (for instance, they are not ideal for verifying conservation laws), but are significantly more convenient for well-posedness theory, basically because integration tends to increase the regularity of a function, while differentiation reduces it. (Indeed, the problem of “losing derivatives”, or more precisely “losing regularity”, is a key obstacle that one often has to address when trying to establish well-posedness for PDE, particularly those that are quite nonlinear and with rough initial data, though for nonlinear parabolic equations such as Navier-Stokes the obstacle is not as serious as it is for some other PDE, due to the smoothing effects of the heat equation.)

One weakness of the methods deployed here are that the quantitative bounds produced deteriorate to the point of uselessness in the inviscid limit , rendering these techniques unsuitable for analysing the Euler equations in which . However, some of the methods developed in later notes have bounds that remain uniform in the limit, allowing one to also treat the Euler equations.

In this and subsequent set of notes, we use the following asymptotic notation (a variant of Vinogradov notation that is commonly used in PDE and harmonic analysis). The statement , , or will be used to denote an estimate of the form (or equivalently ) for some constant , and will be used to denote the estimates . If the constant depends on other parameters (such as the dimension ), this will be indicated by subscripts, thus for instance denotes the estimate for some depending on .

** — 1. Ordinary differential equations — **

We now study solutions to ordinary differential equations (4), focusing in particular on the initial value problem when the initial state is specified. We restrict attention to *strong solutions* , in which is continuously differentiable () in the time variable, so that the derivative in (4) can be interpreted as the classical (strong) derivative, and one has the classical fundamental theorem of calculus

whenever (in this post we use the signed definite integral, thus ).

We begin with homogeneous linear equations

where is a linear operator. Using the integrating factor , where is the matrix exponential of , and noting that , we see that this equation is equivalent to

and hence from the fundamental theorem of calculus we see that if then we have the unique global solution , or equivalently

More generally, if one wishes to solve the inhomogeneous linear equation

for some continuous with initial condition , then from the fundamental theorem of calculus we have a unique global solution given by

or equivalently one has the Duhamel’s formula

which is continuously differentiable in time if is continuous. Intuitively, the first term represents the contribution of the initial data to the solution at time (with the factor representing the evolution from time to time ), while the integrand represents the contribution of the forcing term at time to the solution at time (with the factor representing the evolution from time to time ).

One can apply a similar analysis to the differential inequality

where is now a scalar continuously differentiable function, are continuous functions, and is an interval containing as its left endpoint; we also assume an initial condition . Here, the natural integrating factor is , whose derivative is by the chain rule and the fundamental theorem of calculus. Applying this integrating factor to (7), we may write it as

and hence by the fundamental theorem of calculus we have

for all (compare with (6)). This is the differential form of Grönwall’s inequality. In the homogeneous case , the inequality of course simplifies to

We continue assuming that for simplicity. From the fundamental theorem of calculus, (7) (and the initial condition ) implies the integral inequality

although the converse implication of (7) from (10) is false in general. Nevertheless, there is an analogue of (9) just assuming the weaker inequality (10), and not requiring any differentiability on , at least when all functions involved are non-negative:

**Lemma 1 (Integral form of Grönwall inequality)** Let be an interval containing as left endpoint, let , and let be continuous functions obeying the inequality (10) for all . Then one has (9) for all .

*Proof:* From (10) and the fundamental theorem of calculus, the function is continuously differentiable and obeys the differential inequality

(note here that we use the hypothesis that is non-negative). Applying the differential form (9) of Gronwall’s inequality, we conclude that

The claim now follows from (10).

**Exercise 2** Relax the hypotheses of continuity on to that of being measurable and bounded on compact intervals. (You will need tools such as the fundamental theorem of calculus for absolutely continuous or Lipschitz functions, covered for instance in this previous set of notes.)

Gronwall’s inequality is an excellent tool for bounding the growth of a solution to an ODE or PDE, or the difference between two such solutions. Here is a basic example, one half of the Picard (or Picard-Lindeöf) theorem:

**Theorem 3 (Picard uniqueness theorem)** Let be an interval, let be a finite-dimensional vector space, let be a function that is Lipschitz continuous on every bounded subset of , and let be continuously differentiable solutions to the ODE (4), thus

on . If for some , then and agree identically on , thus for all .

*Proof:* By translating and we may assume without loss of generality that . By splitting into at most two intervals, we may assume that is either the left or right endpoint of ; by applying the time reversal symmetry of replacing by respectively, and also replacing by and , we may assume without loss of generality that is the left endpoint of . Finally, by writing as the union of compact intervals with left endpoint , we may assume without loss of generality that is compact. In particular, are bounded and hence is Lipschitz continuous with some finite Lipschitz constant on the ranges of and .

From the fundamental theorem of calculus we have

and

for every ; subtracting, we conclude

Applying the Lipschitz property of and the triangle inequality, we conclude that

By the integral form of Grönwall’s inequality, we conclude that

and the claim follows.

**Remark 4** The same result applies for infinite-dimensional normed vector spaces , at least if one requires to be continuously differentiable in the strong (Fréchet) sense; the proof is identical.

**Exercise 5 (Comparison principle)** Let be a function that is Lipschitz continuous on compact intervals. Let be an interval, and let be continuously differentiable functions such that

and

for all .

- (a) Suppose that for some . Show that for all . (
*Hint:*there are several ways to proceed here. One is to try to verify the hypotheses of Grönwall’s inequality for the quantity .) - (b) Suppose that for some . Show that for all .

Now we turn to the existence side of the Picard theorem.

**Theorem 6 (Picard existence theorem)** Let be a finite dimensional normed vector space, let , and let lie in the closed ball . Let be a function which has a Lipschitz constant of on the ball . If one sets

then there exists a continuously differentiable solution to the ODE (4) with initial data such that for all .

Note that the solution produced by this theorem is unique on , thanks to Theorem 3. We will be primarily concerned with the case , in which case the time of existence simplifies to .

*Proof:* Using the fundamental theorem of calculus, we write (4) (with initial condition ) in integral form as

Indeed, if is continuously differentiable and solves (4) with on , then (12) holds on . Conversely, if is continuous and solves (12) on , then by the fundamental theorem of calculus the right-hand side of (12) (and hence ) is continuously differentiable and solves (4) with . Thus it suffices to solve the integral equation (12) with a solution taking values in .

We can view this as a fixed point problem. Let denote the space of continuous functions from to . We give this the uniform metric

As is well known, becomes a complete metric space with this metric. Let denote the map

Let us first verify that does map to . If , then is clearly continuous. For any , one has from the triangle inequality that

by choice of , hence as claimed. A similar argument shows that is in fact a contraction on . Namely, if , then

and hence by choice of . Applying the contraction mapping theorem, we obtain a fixed point to the equation , which is precisely (12), and the claim follows.

**Remark 7** The proof extends without difficulty to infinite dimensional Banach spaces . Up to a multiplicative constant, the result is sharp. For instance, consider the linear ODE for some , with . Here, the function is of course Lipschitz with constant on all of , and the solution is of the form , hence will exit in time , which is only larger than the time given by the above theorem by a multiplicative constant.

We can iterate the Picard existence theorem (and combine it with the uniqueness theorem) to conclude that there is a *maximal Cauchy development* to the ODE (4) with initial data , with the solution diverging to infinity (or “blowing up”) at the endpoint if this endpoint is finite, and similarly for (thus one has a dichotomy between global existence and finite time blowup). More precisely:

**Theorem 8 (Maximal Cauchy development)** Let be a finite dimensional normed vector space, let , and let be a function which is Lipschitz on bounded sets. Then there exists and a continuously differentiable solution to (4) with , such that if is finite, and if is finite. Furthermore, , and are unique.

*Proof:* Uniqueness follows easily from Theorem 6. For existence, let be the union of all the intervals containing for which there is a continuously differentiable solution to (4) with . From Theorem 6 contains a neighbourhood of the origin. From Theorem 7 there is a continuously differentiable solution to (4) with . If is contained in , then by Theorem 6 (and time translation) one could find a solution to (4) in a neighbourhood of such that ; by Theorem 7 we must then have , otherwise we could glue to and obtain a solution on alarger domain than , contradicting the definition of . Thus is open, and is of the form for some .

Suppose for contradiction that is finite and does not go to infinity as . Then there exists a finite and a sequence such that . Let be the Lipschitz constant of on . By Theorem 6, for each one can find a solution to (4) on with , where does not depend on . For large enough, this and Theorem 7 allow us to extend the solution outside of , contradicting the definition of . Thus we have when is finite, and a similar argument gives when is finite.

**Remark 9** Theorem 6 gives a more quantitative description of the blowup: if is finite, then for any , one must have

where is the Lipschitz constant of on . This can be used to give some explicit lower bound on blowup rates. For instance, if and behaves like for some in the sense that the Lipschitz constant of on is for any , then we obtain a lower bound

as , if is finite, and similarly when is finite. This type of blowup rate is sharp. For instance, consider the scalar ODE

where takes values in and is fixed. Then for any , one has explicit solutions on of the form

where is a positive constant depending only on . The blowup rate at is consistent with (13) and also with (11).

**Exercise 10 (Higher regularity)** Let the notation and hypotheses be as in Theorem 8. Suppose that is times continuously differentiable for some natural number . Show that the maximal Cauchy development is times continuously differentiable. In particular, if is smooth, then so is .

**Exercise 11 (Lipschitz continuous dependence on data)** Let be a finite-dimensional normed vector space.

- (a) Let , let be a function which has a Lipschitz constant of on the ball , and let be the quantity (11). If , and are the solutions to (4) with given by Theorem 6, show that
- (b) Let be a function which is Lipschitz on bounded sets, let , and let be the maximal Cauchy development of (4) with initial data given by Theorem 6. Show that for any compact interval , there exists an open neighbourhood of , such that for any , there exists a solution of (4) with initial data . Furthermore, the map from to is a Lipschitz continuous map from to .

**Exercise 12 (Non-autonomous Picard theorem)** Let be a finite-dimensional normed vector space, and let be a function which is Lipschitz on bounded sets. Let . Show that there exist and a continuously differentiable function solving the non-autonomous ODE

for with initial data ; furthermore one has if is finite, and if is finite. Finally, show that are unique. (*Hint:* this could be done by repeating all of the previous arguments, but there is also a way to deduce this non-autonomous version of the Picard theorem directly from the Picard theorem by adding one extra dimension to the space .)

The above theory is symmetric with respect to the time reversal of replacing with and with . However, one can break this symmetry by introducing a dissipative linear term, in which case one only obtains the forward-in-time portion of the Picard existence theorem:

**Exercise 13** Let be a finite dimensional normed vector space, let , and let lie in the closed ball . Let be a function which has a Lipschitz constant of on the ball . Let be the quantity in (11). Let be a linear operator obeying the dissipative estimates

for all and . Show that there exists a continuously differentiable solution to the ODE

with initial data such that for all .

**Remark 14** With the hypotheses of the above exercise, one can also solve the ODE backwards in time by an amount , where denotes the operator norm of . However, in the limit as the operator norm of goes to infinity, the amount to which one can evolve backwards in time goes to zero, whereas the time in which one can evolve forwards in time remains bounded away from zero, thus breaking the time symmetry.

** — 2. Leray systems — **

Now we discuss the Leray system of equations

where is given, and the vector field and the scalar field are unknown. In other words, we wish to decompose a specified function as the sum of a gradient and a divergence-free vector field . We will use the usual Lebesgue spaces of measurable functions (up to almost everywhere equivalence) defined on some measure space (which in our case will always be either or with Lebesgue measure) such that the norm is finite. (For , the norm is defined instead to be the essential supremum of .)

Proceeding purely formally, we could solve this system by taking the divergence of the first equation to conclude that

where is the Laplacian of , and then we could formally solve for as

However, if one wishes to justify this rigorously one runs into the issue that the Laplacian is not quite invertible. To sort this out and make this problem well-defined, we need to specify the regularity and decay one wishes to impose on the data and on the solution . To begin with, let us suppose that are all smooth.

We first understand the uniqueness theory for this problem. By linearity, this amounts to solving the homogeneous equation when , thus we wish to classify the smooth fields and solving the system

Of course, we can eliminate and write this a single equation

That is to say, the solutions to this equation arise by selecting to be a (smooth) harmonic function, and to be the negative gradient of . This is consistent with our preceding discussion that identified the potential lack of invertibility of as a key issue.

By linearity, this implies that (smooth) solutions to the system (15) are only unique up to the addition of an arbitrary harmonic function to , and tbe subtraction of the gradient of that harmonic function from .

We can largely eliminate this lack of uniqueness by imposing further requirements on . For instance, suppose in addition that we require to all be -periodic (or *periodic* for short), thus

for and . Then the only freedom we have is to modify by an arbitrary periodic harmonic function (and to subtract the gradient of that function from ). However, by Liouville’s theorem, the only periodic harmonic functions are the constants, whose gradient vanishes. Thus the only freedom in this setting is to add a constant to . This freedom will be almost irrelevant when we consider the Euler and Navier-Stokes equations, since it is only the gradient of the pressure which appears in those equations, rather than the pressure itself. Nevertheless, if one wishes, one could remove this freedom by requiring that be of mean zero: .

Now suppose instead that we only require that and be -periodic, but do not require to be -periodic. Then we have the freedom to modify by a harmonic function which need not be -periodic, but whose gradient is -periodic. Since the gradient of a harmonic function is also harmonic, has to be constant, and so is an affine-linear function. Conversely, all affine-linear functions are harmonic, and their gradients are constant and thus also -periodic. Thus, one has the freedom in this setting to add an arbitrary affine-linear function to , and subtract the constant gradient of that function from .

Instead of periodicity, one can also impose decay conditions on the various functions. Suppose for instance that we require the pressure to lie in an space for some ; roughly speaking, this forces the pressure to decay to zero at infinity “on the average”. Then we only have the freedom to modify by a harmonic function that is also in the class (and modify by the negative gradient of this harmonic function). However, the mean value property of harmonic functions implies that

for any ball of some radius centred around , where denotes the measure of the ball. By Hölder’s inequality, we conclude that

Sending we conclude that vanishes identically; thus there are no non-trivial harmonic functions in . Thus there is uniqueness for the problem (15) if we require the pressure to lie in . If instead we require the vector field to be in , then we can modify by a harmonic function with in , thus vanishes identically and hence is constant. So if we require then we only have the freedom to adjust by arbitrary constants.

Having discussed uniqueness, we now turn to existence. We begin with the periodic setting in which are required to be -periodic and smooth, so that they can also be viewed (by slight abuse of notation) as functions on the torus . The system (15) is linear and translation-invariant, which strongly suggests that one solve the system using the Fourier transform (which tends to diagonalise linear translation-invariant equations, because the plane waves that underlie the Fourier transform are the eigenfunctions of translation.) Indeed, we may expand as Fourier series

where the Fourier coefficients , , are given by the formulae

When are smooth, then are rapidly decreasing as , which will allow us to justify manipulations such as interchanging summation and derivatives without difficulty. Expanding out (15) in Fourier series and then comparing Fourier coefficients (which are unique for smooth functions), we obtain the system

for each . As mentioned above, the Fourier transform has *diagonalised* the system (15), in that there are no interactions between different frequencies , and we now have a decoupled system of vector equations. To solve these equations, we can take the inner product of both sides of (18) with and apply (19) to conclude that

For non-zero , we can then solve for and hence by the formulae

and

For , these formulae no longer apply; however from (18) we see that , while can be arbitrary (which corresponds to the aforementioned freedom to add an arbitrary constant to ). Thus we have the explicit general solution

where is an arbitrary constant. Note that if is smooth, then is rapidly decreasing and the functions defined by the above formulae are also smooth.

We can write the above general solution in a form similar to (16), (17) as

where, *by definition*, the inverse Laplacian of a smooth periodic function of mean zero is given by the Fourier series formula

(Note that automatically has mean zero.) It is easy to see that for such functions , thus justifying the choice of notation. We refer to as the (periodic) Leray projection of and denote it , thus in the above solution we have . By construction, is divergence-free, and vanishes whenever is a gradient .

If we require to be -periodic, but do not require to be -periodic, then by the previous uniqueness discussion, the general solution is now

where and are arbitrary.

The above discussion was for smooth periodic functions , but one can make the same construction in other function spaces. For instance, recall that for any , the Sobolev space consists of those elements of whose Sobolev norm

is finite, where we use the “Japanese bracket” convention . (One can also define Sobolev spaces for negative , but we will not need them here.) Basic properties of these Sobolev spaces can be found in this previous post. From comparing Fourier coefficients we see that the operators and defined for smooth periodic functions can be extended without difficulty to (taking values in and respectively), with bounds of the form

Thus, if , then one can solve (15) (in the sense of distributions, at least) with some and , with bounds

In particular, the Leray projection is bounded on . (In fact it is a contraction; see Exercise 16.)

One can argue similarly in the non-periodic setting, as long as one avoids the one-dimensional case which contains some technical divergences. Recall (see e.g., these previous lecture notes on this blog) that functions have a Fourier transform , which for in the dense subclass of is defined by the formula

and then is extended to the rest of by continuous extension in the topology, taking advantage of the Plancherel identity

The Fourier transform is then extended to tempered distributions in the usual fashion (see this previous set of notes).

We then define the Sobolev space for to be the collection of those functions for which the norm

is finite; equivalently, one has

where the Fourier multiplier is defined by

For any vector-valued function in the Schwartz class, we define to be the scalar tempered distribution whose (distributional) Fourier transform is given by the formula

and define the Leray projection to be the vector-valued distribution

or in terms of the (distributional) Fourier transform

Then by using the well-known relationship

between (distributional) derivatives and (distributional) Fourier transforms we see that the tempered distributions

solve the equation (15) in the distributional sense, and hence also in the classical sense since have rapidly decreasing Fourier transforms and are thus smooth.

As in the periodic case we see that we have the bound

for all Schwartz vector fields (in fact is again a contraction), so we can extend the Leray projection without difficulty to functions. The operator can similarly be extended continuously to a map from to the space of scalar tempered distributions with gradient in , although we will not need to work directly with the pressure much in this course. This allows us to solve (15) in a distributional sense for all .

**Remark 15** For the principal value in the distribution (21) is not necessary since the expression is locally integrable in that case, thanks to the Cauchy-Schwarz inequality; but when it becomes needed due to the failure of to be locally integrable in two dimensions.

**Exercise 16 (Hodge decomposition)** Define the following three subspaces of the Hilbert space :

- is the space of all elements of of the form (in the sense of distributions) for some ;
- is the space of all elements of that are weakly harmonic in the sense that (in the sense of distributions).
- is the space of all elements of which take the form
(with the usual summation conventions) for some tensor obeying the antisymmetry property .

- (a) Show that these three spaces are closed subspaces of , and one has the orthogonal decomposition
This is a simple case of a more general splitting known as the Hodge decomposition, which is available for more general differential forms on manifolds.

- (b) Show that on , the Leray projection is the orthogonal projection to .
- (c) Show that the Leray projection is a contraction on for all .

**Exercise 17 (Helmholtz decomposition)** Define the following two subspaces of the Hilbert space :

- is the space of functions which are divergence-free, by which we mean that in the sense of distributions.
- is the space of functions which are curl-free, by which we mean that in the sense of distributions, where is the rank two tensor with components .

- (a) Show that these two spaces are closed subspaces of , and one has the orthogonal decomposition
This is known as the Helmholtz decomposition (particularly in the three-dimensional case , in which one can interpret as the curl of ).

- (b) Show that on , the Leray projection is the orthogonal projection to .
- (c) Show that the Leray projection is a contraction on for all .

**Exercise 18 (Singular integral form of Leray projection)** Let . Then the function is locally integrable and thus well-defined as a distribution.

- (a) For , show that the distribution , defined on test functions by the formula
can be expressed in principal value form as

where denotes the surface area of the unit sphere in and is the Kronecker delta.

- (b) Conclude in particular the Newtonian potential identity
where (at the risk of a mild notational clash) is the Dirac delta distribution at .

- (c) For a test vector field , establish the explicit form
- (d) Extend part (c) to the case . (
*Hint:*Replace the role of with , in the spirit of the replica trick from physics.)

**Remark 19** One can also solve (15) in -based Sobolev spaces for exponents other than by using Calderón-Zygmund theory and the singular integral form of the Leray projection given in Exercise 18. However, we will try to avoid having to rely on this theory in these notes.

** — 3. The heat equation — **

We now turn to the study of the heat equation

on a spacetime region , with initial data , where is a fixed constant; we also consider the inhomogeneous analog

Formally, the solution to the initial value problem for (23) should be given by , and (by the Duhamel formula(6)) the solution to (24) should similarly be

but there are subtleties arising from the unbounded nature of .

The first issue is that even if vanishes and is required to be smooth without any decay hypothesis at infinity, one can have non-uniqueness. The following counterexample is basically due to Tychonoff:

**Exercise 20 (Tychonoff example)** Let be a real number, and let .

- (a) Show that there exists smooth, compactly supported function , not identically zero, obeying the derivative bounds
for all and . (

*Hint:*one can construct as the convolution of an infinite number of approximate identities , where each is supported on an interval of length , and use the identity repeatedly. To justify things rigorously, one may need to first work with finite convolutions and take limits.) - (b) With as in part (i) show that the function
is well-defined as a smooth function on that is compactly supported in time, and obeys the heat equation (23) for without being identically zero.

- (c) Show that the initial value problem to (23) is not unique (for any dimension ) if is only required to be smooth, even if vanishes.

**Exercise 21 (Kowalevski example)**

- (a) Let be the function . Show that there does not exist any solution to (23) that is jointly real analytic in at (that is to say, it can be expressed as an absolutely convergent power series in in a neighbourhood of ).
- (b) Modify the above example by replacing by a function that extends to an entire function on (as opposed to , which has poles at ).

This classic example, due to Sofia Kowalevski, demonstrates the need for some hypotheses on the PDE in order to invoke the Cauchy-Kowaleski theorem.

One can recover uniqueness (forwards in time) by imposing some growth condition at infinity. We give a simple example of this, which illustrates a basic tool in the subject, namely the energy method, which is based on understanding the rate of change of various “energy” integrals of integrands which primarily involve quadratic expressions of the solution or its derivatives. The reason for favouring quadratic expressions is that they are more likely to produce integrals with a definite sign (positive definite or negative definite), such as (squares of) norms or higher Sobolev norms of the solution, particularly after suitable application of integration by parts.

**Proposition 22 (Uniqueness with energy bounds)** Let , and let be smooth solutions to (24) with common initial data and forcing term such that the norm

of is finite, and similarly for . Then .

*Proof:* As the heat equation (23) is linear, we may subtract from and assume without loss of generality that , , and . By working with each component separately we may take .

Let be a non-negative test function supported on that equals on . Let be a parameter, and consider the “energy” (or more precisely, “local mass”)

for . As , we have . As is smooth and is compactly supported, depends smoothly on , and we can differentiate under the integral sign to obtain

Using (23) we thus have

using the usual summation conventions.

A basic rule of thumb in the energy method is this: whenever one is faced with an integral in which one term in the integrand has much lower regularity (or much less control on regularity) than any other, due to a large number of derivatives placed on that term, one should integrate by parts to move one or more derivatives off of that term to other terms in order to make the distribution of derivatives more balanced (which, as we shall see, tends to make the integrals easier to estimate, or to ascribe a definite sign to). Accordingly, we integrate by parts to write

The first term is non-positive, thus we may discard it to obtain the inequality

Another rule of thumb in the energy method is to keep an eye out for opportunities to express some expression appearing in the integrand as a total derivative In this case, we can write

and then integrate by parts to move the derivative on to the much more slowly varying function to conclude

In particular we have a bound of the form

where the subscript indicates that the implied constant can depend on and . Since , we conclude from the fundamental theorem of calculus that

for all (note how it is important here that we evolve forwards in time, rather than backwards). Sending and using the dominated convergence theorem, we conclude that

and thus vanishes identically, as required.

Now we turn to existence for the heat equation, restricting attention to forward in time solutions. Formally, if one solves the heat equation (23), then on taking spatial Fourier transforms

the equation transforms to the ODE

which when combined with the initial condition gives

and hence by the Fourier inversion formula we arrive (formally, at least) at the representation

As we are assuming forward time evolution , the exponential factor here is bounded. In the case that is a Schwartz function, then is also Schwartz, and this formula is certainly well-defined to be smooth in both time and space (and rapidly decreasing in space for any fixed time), and in particular in ; one can easily justify differentiation under the integral sign to conclude that (23) is indeed verified, and the Fourier inversion formula shows that we have the initial data condition . So this is the unique solution to the initial value problem (23) for the heat equation that lies in . By definition we declare the right-hand side of (25) to be , thus

for all and all Schwartz functions ; equivalently, one has

(One can justify this choice of notation using the functional calculus of the self-adjoint operator , as discussed for instance in this previous blog post, but we will not do so here since the Fourier transform is available as a substitute.) It is also clear from (27) that commutes with other Fourier multipliers such as or constant-coefficient differential operators, on Schwartz functions at least.

From (27) and Plancherel’s theorem we see that for is a contraction in (the Schwartz functions of) , and more generally in for any , thus

for any Schwartz and any . Thus by density one can extend the heat propagator for to all of , in a fashion that is a contraction on and more generally on . By a limiting argument, (27) holds almost everywhere for all .

There is also a smoothing effect:

**Exercise 23 (Smoothing effect)** Let . Show that

for all and .

**Exercise 24 (Fundamental solution for the heat equation)** For and , establish the identity

for almost every . (*Hint:* first work with Schwartz functions. Either compute the Fourier transform explicitly, or verify directly that the heat equation initial value problem is solved by the right-hand side.) Conclude in particular that (after modification on a measure zero set if necessary) is smooth for any .

**Exercise 25 (Ill-posedness of the backwards heat equation)** Show that there exists a Schwartz function with the property that there is no solution to (23) with final data for any . (*Hint:* choose so that the Fourier transform decays somewhat, but not extremely rapidly. Then argue by contradiction using (27).

**Exercise 26 (Continuity in the strong operator topology)** For any , let denote the Banach space of functions such that for each , lies in and varies continuously and boundedly in in the strong topology, with norm

Show that if and solves the heat equation on , then with

Similar considerations apply to the inhomogeneous heat equation (24). If and are Schwartz for some , then the function defined by the Duhamel formula

can easily be verified to also be Schwartz and solve (24) with initial data ; by Proposition 22, this is the only such solution in . It also obeys good estimates:

**Exercise 27 (Energy estimates)** Let and be Schwartz functions for some , and let be the solution to the equation

with initial condition given by the Duhamel formula. For any , establish the energy estimate

in two different ways:

- (i) By using the Fourier representation (27) and Plancherel’s formula;
- (ii) By using energy methods as in the proof of Proposition 22. (
*Hint:*first reduce to the case . You may find the arithmetic mean-geometric mean inequality to useful.)

Here of course we are using the norms

and

The energy estimate contains some smoothing effects similar (though not identical) to those in Exercise 23, since it shows that can in principle be one degree of regularity smoother than (if one averages in time in an sense, and the viscosity is not sent to zero), and two degrees of regularity smoother than the forcing term (with the same caveats). As we shall shortly see, this smoothing effect will allow us to handle the nonlinear terms in the Navier-Stokes equations for the purposes of setting up a local well-posedness theory.

**Exercise 28 (Distributional solution)** Let , let , and let for some . Let be given by the Duhamel formula (28). Show that (24) is true in the spacetime distributional sense, or more precisely that

in the sense of spaceime distributions for any test function supported in the interior of .

Pretty much all of the above discussion can be extended to the periodic setting:

- (a) If is smooth, define by the formula
where are the Fourier coefficients of . Show that extends continuously to a contraction on for every , and that if then the function lies in .

- (b) For and , establish the formula
for almost every , where (by abuse of notation) we identify functions with -periodic functions in the usual fashion.

- (c) If , and and are smooth, show that the function defined by (28) is smooth and solves the inhomogeneous equation (24) with initial data , and that this is the unique smooth solution to that initial value problem.
- (d) If , , and and are smooth, and is the unique smooth solution to the heat equation with , establish the energy estimate
- (e) If , and , show that the function given by (28) is in and obeys (24) in the sense of spacetime distributions (30).

**Remark 30** The heat equation for negative viscosities can be transformed into a positive viscosity heat equation by time reversal: if solves the equation , then solves the equation . Thus one can solve negative viscosity heat equations (also known as *backwards heat equations*) backwards in time, but one tends not to have well-posedness forwards in time. In a similar spirit, if is positive, one can normalise it to (say) by an appropriate rescaling of the time variable, . However, we will generally keep the parameter non-normalised in preparation for understanding the limit as .

** — 4. Local well-posedness for Navier-Stokes — **

We now have all the ingredients necessary to create a local well-posedness theory for the Navier-Stokes equations (1).

We first dispose of the one-dimensional case , which is rather degenerate as incompressible one-dimensional fluids are somewhat boring. Namely, suppose that one had a smooth solution to the one-dimensional Navier-Stokes equations

The second equation implies that is just a function of time, , and the first equation becomes

To solve this equation, one can set to be an arbitrary smooth function of time, and then set

for an arbitrary smooth function . If one requires the pressure to be bounded, then vanishes identically, and then is constant in time, which among other things shows that the initial value problem is (rather trivially) well-posed in the category of smooth solutions, up to the ability to alter the pressure by an arbitrary constant . On the other hand, if one does not require the pressure to stay bounded, then one has a lot less uniqueness, since the function is essentially unconstrained.

Now we work in two or higher dimensions , and consider solutions to (1) on the spacetime region . To begin with, we assume that is smooth and periodic in space: for ; we assume is smooth but do not place any periodicity hypotheses on it. Then, by (1), is periodic. In particular, for any and , the function has vanishing gradient and is thus constant in , so that

for all and some function of . The map is a homomorphism for fixed , so we can write for some , which will be smooth since is smooth. We thus have for some smooth -periodic function . By subtracting off the mean, we can further decompose

for some smooth function and some smooth -periodic function which has mean zero at every time.

Note that one can simply omit the constant term from the pressure without affecting the system (1). One can also eliminate the linear term by the following “generalised Galilean transformation“. If are as above, and one lets

be the primitive of , then a short calculation reveals that the smooth function defined by

solves the Navier-Stokes equations

with having the same initial data as ; conversely, if is a solution to Navier-Stokes, then so is . In particular this reveals a lack of uniqueness for the periodic Navier-Stokes equations that is essentially the same lack of uniqueness that is present for the Leray system: one can add an arbitrary spatially affine function to the pressure by applying a suitable Galilean transform to . On the other hand, we can eliminate this lack of uniqueness by requiring that the pressure be *normalised* in the sense that and , that is to say we require to be -periodic and mean zero. The above discussion shows that any smooth solution to Navier-Stokes with periodic can be transformed by a Galilean transformation to one in which the pressure is normalised.

Once the pressure is normalised, it turns out that one can recover uniqueness (much as was the case with the Leray system):

**Theorem 31 (Uniqueness with normalised pressure)** Let be two smooth periodic solutions to (1) on with normalised pressure such that . Then .

*Proof:* We use the energy method. Write , then subtracting (1) for from we see that is smooth with

and

Now we consider the energy . This varies smoothly with , and we can differentiate under the integral sign to obtain

where

and we have omitted the explicit dependence on and for brevity.

For , we observe the total derivative and integrate by parts to conclude that

since is divergence-free. Similarly, integration by parts shows that vanishes since is divergence-free. Another integration by parts gives

and hence . Finally, from Hölder’s inequality we have

and hence

Since , we conclude from Gronwall’s inequality that for all , and hence is identically zero, thus . Substituting this into (1) we conclude that ; as have mean zero, we conclude (e.g., from Fourier inversion) that , and the claim follows.

Now we turn to existence in the periodic setting, assuming normalised pressure. For various technical reasons, it is convenient to reduce to the case when the velocity field has zero mean. Observe that the right-hand sides , of (1) have zero mean on , thanks to integration by parts. A further integration by parts, using the divergence-free condition , reveals that the transport term also has zero mean:

Thus, we see that the mean is a conserved integral of motion: if is the mean initial velocity, and is a solution to (1) (obeying some minimal regularity hypothesis), then continues to have mean velocity for all subsequent times. On the other hand, if is a smooth periodic solution to (1) with normalised pressure and initial velocity , then the Galilean transform defined by

can be easily verified to be a smooth periodic solution to (1) with normalised pressure and initial velocity . Of course, one can reconstruct from by the inverse tranformation

Thus, up to this simple transformation, solving the initial value problem for (1) for is equivalent to that of , so we may assume without loss of generality that the initial velocity (and hence the velocity at all subsequent times) has zero mean.

A general rule of thumb is that whenever an integral of a solution to a PDE can be proven to vanish (or be equal to boundary terms) by integration by parts, it is because the integrand can be rewritten in “divergence form” – as the divergence of a tensor of one higher rank. (This is because the integration by parts identity arises from the divergence form of the expression .) Thus we expect the transport term to be in divergence form. Indeed, in components we have

since we have the divergence-free condition , we thus have from the Leibniz rule that

We write this in coordinate-free notation as

where is the tensor product and denotes the divergence

Thus we can rewrite (1) as the system

Next, we observe that we can use the Leray projection operator to eliminate the role of the (normalised) pressure. Namely, if are a smooth periodic solution to (1) with normalised pressure, then on applying (which preserves divergence-free vector fields such as and , but annihilates gradients such as ) we conclude an equation that does not involve the pressure at all:

Conversely, suppose that one has a smooth periodic solution to (33) with initial condition for some smooth periodic divergence-free vector field . Taking divergences of both sides of (33), we then conclude that

that is to say obeys the heat equation (23). Since is periodic, smooth, and vanishes at , we see from Exercise 29(c) that vanishes on all of , thus is divergence free on the entire time interval . From (33) and (22) we thus see that if one defines to be the function

(which can easily be verified to be a smooth function in both space and time) then is a smooth periodic solution to (1) with normalised pressure and initial condition (and is thus the unique solution to this system, thanks to Theorem 31). Thus, the problem of finding a smooth solution to (1) in the smooth periodic setting with normalised pressure and divergence-free initial data is equivalent to that of solving (33) with the same initial data.

By Duhamel’s formula (Exercise 29(c)), any smooth solution to the initial value problem (33) with obeys the Duhamel formula

(The operator is sometimes referred to as the *Oseen operator* in the literature.) Conversely, a smooth solution to (34) will solve the initial value problem (33) with initial data .

To obtain existence of smooth periodic solutions (with normalised pressure) to the Navier-Stokes equations with given smooth divergence-free periodic initial data , it thus suffices to find a smooth periodic solution to the integral equation (34). We will achieve this by a two-step procedure:

- (i) (Existence at finite regularity) Construct a solution to (34) in a certain function space with a finite amount of regularity (assuming that the initial data has a similarly finite amount of regularity); and then
- (ii) (Propagation of regularity) show that if is in fact smooth, then the solution constructed in (i) is also smooth.

The reason for this two step procedure is that one wishes to solve (34) using iteration-type methods (which for instance power the contraction mapping theorem that was used to prove the Picard existence theorem); however the function space that one ultimately wishes the solution to lie in is not well adapted for such iteration (for instance, it is not a Banach space, instead being merely a Fréchet space). Instead, we iterate in an auxiliary lower regularity space first, and then “bootstrap” the lower regularity to the desired higher regularity. Observe that the same situation occured with the Picard existence theorem, where one performed the iteration in the low regularity space , even though ultimately one desired the solution to be continuously differentiable or even smooth.

Of course, to run this procedure, one actually has to write down an explicit function space in which one will perform the iteration argument. Selection of this space is actually a non-trivial matter and often requires a substantial amount of trial and error, as well as experience with similar iteration arguments for other PDE. Often one is guided by the function space theory for the linearised counterpart of the PDE, which in this case is the heat equation (23). As such, the following definition can be at least partially motivated by the energy estimates in Exercise 29(d).

**Definition 32 (Mild solution)** Let , , and let be divergence-free, where denotes the subspace of consisting of mean zero functions. An *-mild solution* (or *Fujita-Kato mild solution* to the Navier-Stokes equations with initial data is a function in the function space

that obeys the integral equation (34) (in the sense of distributions) for all . We say that is a mild solution on if it is a mild solution on for every .

**Remark 33** The definition of a mild solution could be extended to those choices of initial data that are not divergence-free, but then this solution concept no longer has any direct connection with the Navier-Stokes equations, so we will not consider such “solutions” here. Similarly, one could also consider mild solutions without the mean zero hypothesis, but the function space estimates are slightly less favourable in this setting and so we shall restrict attention to mean zero solutions only.

Note that the regularity on places in (with plenty of room to spare), which is more than enough regularity to make sense of the right-hand side of (34). One can also define mild solutions for other function spaces than the one provided here, but we focus on this notion for now, which was introduced in the work of Fujita and Kato. We record a simple compatibility property of mild solutions:

**Exercise 34 (Splitting)** Let , , let be divergence-free, and let

Let . Show that the following are equivalent:

- (i) is an mild solution to the Navier-Stokes equations on with initial data .
- (ii) is an mild solution to the Navier-Stokes equations on with initial data , and the translated function defined by is an mild solution to the Navier-Stokes equations with initial condition .

To use this notion of a mild solution, we will need the following harmonic analysis estimate:

**Proposition 35 (Product estimate)** Let , and let . Then one has , with the estimate

When this claim follows immediately from Hölder’s inequality. For the claim is similarly immediate from the Leibniz rule and the triangle and Hölder inequalities (noting that is comparable to . For more general the claim is not quite so immediate (for instance, when one runs into difficulties controlling the intermediate term arising in the Leibniz expansion of ). Nevertheless the bound is still true. However, to prove it we will need to introduce a tool from harmonic analysis, namely Littlewood-Paley theory, and we defer the proof to the appendix.

We also need a simple case of Sobolev embedding:

**Exercise 36 (Sobolev embedding)**

- (a) If , show that for any , one has with
- (b) Show that the inequality fails at .
- (c) Establish the same statements with replaced by throughout.

In particular, combining this exercise with Proposition 35 we see that for , is a Banach algebra:

Now we can construct mild solutions at high regularities .

**Theorem 37 (Local well-posedness of mild solutions at high regularity)** Let , and let be divergence-free. Then there exists a time

and an mild solution to (34). Furthermore, this mild solution is unique.

The hypothesis is not optimal; we return to this point later in these notes.

*Proof:* We begin with existence. We can write (34) in the fixed point form

We remark that this expression automatically has mean zero since has mean zero. Let denote the function space

with norm

This is a Banach space. Because of the mean zero restriction on , we may estimate

Note that if , then by (35), which by Exercise 29(d) (and the fact that commutes with and is a contraction on ) implies that . Thus is a map from to . In fact we can obtain more quantitative control on this map. By using Exercise 29(d), (35), and the Hölder bound

Thus, if we set for a suitably large constant , and set for a sufficienly small constant , then maps the closed ball in to itself. Furthermore, for , we have by similar arguments to above

and hence if the constant is chosen small enough, is also a contraction (with constant, say, ) on . Thus there exists such that , thus is an mild solution.

Now we show that it is the only mild solution. Suppose for contradiction that there is another mild solution with the same initial data . This solution might not lie in , but it will lie in for some . By the same arguments as above, if is sufficiently small depending on then will be a contraction on , which implies that and agree on . Now we apply Exercise 34 to advance in time by and iterate this process (noting that depends on but does not otherwise depend on or ) until one concludes that on all of .

Iterating this as in the proof of Theorem 8, we have

**Theorem 38 (Maximal Cauchy development)** Let , and let be divergence-free. Then there exists a time and an mild solution to (34), such that if then as . Furthermore, and are unique.

In principle, if the initial data belongs to multiple Sobolev spaces the maximal time of existence could depend on (so that the solution exits different regularity classes at different times). However, this is not the case, because there is an -independent blowup criterion:

**Proposition 39 (Blowup criterion)** Let be as in Theorem 38. If , then .

Note from Exercise 36 that is finite for any . This shows that is the unique time at which the norm “blows up” (becomes infinite) and thus is independent of .

*Proof:* Suppose for contradiction that but that the quantity was finite. Let be parameters to be optimised in later. We define the norm

As is a mild solution, this expression is finite.

We adapt the proof of Theorem 37. Using Exercise 29(d) (and Exercise 34) we have

Again we discard and use (a variant of) (37) to conclude

If we now use Proposition 35 in place of (35), we conclude that

If we choose to be sufficiently close to (depending on and ), we can absorb the second term on the RHS into the LHS and conclude that

In particular, stays bounded as , contradicting Theorem 38.

**Corollary 40 (Existence of smooth solutions)** If is smooth and divergence free then there is a and a smooth periodic solution to the Navier-Stokes equations on with normalised pressure such that if , then . Furthermore, and are unique.

*Proof:* As discussed previously, we may assume without loss of generality that has mean zero. As is periodic and smooth, it lies in for every . From the preceding discussion we already have and a function that is an mild solution for every , and with if is finite. It will suffice to show that is smooth, since we know from preceding discussion that a smooth solution to (33) can be converted to a smooth solution to (1).

By Exercise 29, one has

in the sense of spacetime distributions. The right-hand side lies in for every , hence the left-hand side does also; this makes lie in . It is then easy to see that this implies that the right-hand side of the above equation lies in for every , and so now lies in for every . Iterating this (and using Sobolev embedding) we conclude that is smooth in space and time, giving the claim.

**Remark 41** When , it is a notorious open problem whether the maximal lifespan given by the above corollary is always infinite.

**Exercise 42 (Instantaneous smoothing)** Let , let be divergence-free, and let be the maximal Cauchy development provided by Theorem 38. Show that is smooth on (note the omission of the initial time ). (*Hint:* first show that is a mild solution for arbitrarily small .)

**Exercise 43 (Lipschitz continuous dependence on initial data)** Let , let , and let be divergence-free. Suppose one has an mild solution to the Navier-Stokes equations with initial data . Show that there in a neighbourhood of in (the divergence-free elements of) in , such that for every , there exists an mild solution to the Navier-Stokes equations with initial data with the map from to Lipschitz continuous (using the metric for the initial data and the metric for the solution ).

Now we discuss the issue of relaxing the regularity condition in the above theory. The main inefficiency in the above arguments is the use of the crude estimate (37), which sacrifices some of the exponent in time in exchange for extracting a positive power of the lifespan that can be used to create a contraction mapping, as long as is small enough. It turns out that by using a different energy estimate than Exercise 29(d), one can avoid such an exchange, allowing one to construct solutions at lower regularity, and in particular at the “critical” regularity of . Furthermore, in the category of smooth solutions, one can even achieve the desirable goal of ensuring that the time of existence is infinite – but only provided that the initial data is small. More precisely,

**Proposition 44** Let and let . Then the function defined by the Duhamel formula

also has mean zero for all , and obeys the estimates

*Proof:* By Minkowski’s integral inequality, it will suffice to establish the bounds in the case . The first two norms of the right-hand side are already established by Exercise 29(d), so it remains to establish the estimate

By working with the rescaled function (and also rescaling ), we may normalise . By a limiting argument we may assume without loss of generality that is Schwarz. We cannot directly apply Exercise 36 here due to the failure of endpoint Sobolev embedding; nevertheless we may argue as follows. For any , we see from (31), the mean zero hypothesis, and the triangle inequality that

and hence by Cauchy-Schwarz and the bound

(which can be verified using the integral test for , while for it is easy to bound the LHS by ) we have

Integrating this in using Fubini’s theorem, we conclude

and the claim follows.

This gives the following small data global existence result, also due to Fujita and Kato:

**Theorem 45 (Small data global existence)** Suppose that is divergence-free with norm at most , where is a sufficiently small constant depending only on . Then there exists a mild solution to the Navier-Stokes equations on . Furthermore, if is smooth, then this mild solution is also smooth.

*Proof:* By working with the rescaled function , we may normalise . Let denote the Banach space of functions

with the obvious norm

Let be the Duhamel operator (36). If , then by Proposition 44 and Lemma 35 one has

In particular, maps to . A similar argument establishes the bound

for all . For small enough, will be a contraction on for some absolute constant depending only on , and hence has a fixed point which will be the desired mild solution.

Now suppose that is smooth. Let , and let be the maximal Cauchy development provided by Theorem 38. For any , if one defines

then the preceding arguments give

thus either or . On the other hand, depends continuously on and converges to as . For small enough, this implies that for all (this is an example of a “continuity argument”). Next, if we set

then repeating the previous arguments also gives

as is finite and , we conclude (for small enough) that

In particular we have

for all , and hence by Theorem 38 we have . The argument used to prove Corollary 40 shows that is smooth, and the claim follows.

**Remark 46** Modifications of this argument also allow one to establish local existence of mild solutions when the initial data lies in , but has large norm rather than norm less than . However, in this case one does not have a lower bound on the time of existence that depends only on the norm of the data, as was the case with Theorem 37. Further modification of the argument also allows one to extend Theorem 38 to the entire “subcritical” range of regularities . See the paper of Fujita and Kato for details.

We now turn attention to the non-periodic case in two and higher dimensions . The theory is largely identical, though with some minor technical differences. Unlike the periodic case, we will not attempt to reduce to the case of having mean zero (indeed, we will not even assume that is absolutely integrable, so that the mean might not even be well defined).

In the periodic case, we focused initially on smooth solutions. Smoothness is not sufficient by itself in the non-periodic setting to provide a good well-posedness theory, as we already saw in Section 3 when discussing the linear heat equation; some additional decay at spatial infinity is needed. There is some flexibility as to how much smoothness to prescribe. Let us say that a solution to Navier-Stokes is *classical* if and are smooth, and furthermore lies in for every .

Now we work on normalising the pressure. Suppose is a classical solution. As before we may write the Navier-Stokes equation in divergence form as (32). Taking a further divergence we obtain the equation

The function belongs to for every , so if we define the normalised pressure

via the Fourier transform as

then will also belong to for every . We then have for some smooth harmonic function . To control this harmonic function, we return to (32), which we write as

where

and apply the fundamental theorem of calculus to conclude that

The left-hand side is harmonic (thanks to differentiating under the integral sign), and the right-hand side lies in (in fact it is in for every ), hence both sides vanish. By the fundamental theorem of calculus this implies that vanishes identically, thus is constant in space. One can then subtract it from the pressure without affecting (1). Thus, in the category of classical solutions, at least, we may assume without loss of generality that we have *normalised pressure*

in which case the Navier-Stokes equations may be written as before as (33). (See also this paper of mine for some variants of this argument.)

**Exercise 47 (Uniqueness with normalised pressure)** Let be two smooth classical solutions to (1) on with normalised pressure such that . Then .

We can now define the notion of a Fujita-Kato mild solution as before, except that we replace all mention of the torus with the Euclidean space , and omit all requirements for the solution to be of mean zero. As stated in the appendix, the product estimate in Proposition 35 continues to hold in , so one can obtain the analogue of Theorem 37, Theorem 38, Proposition 39, and Corollary 40 on by repeating the proofs with the obvious changes; we leave the details as an exercise for the interested reader.

**Exercise 48** Establish an analogue of Proposition 44 on , using the homogeneous Sobolev space defined to be the closure of the Schwartz functions with respect to the norm

and use this to state and prove an analogue of Theorem 45.

** — 5. Heuristics — **

There are several further extensions of these types of local and global existence results for smooth solutions, in which the role of the Sobolev spaces here are replaced by other function spaces. For instance, in three dimensions in the non-periodic setting, the role of the critical space was replaced by the larger critical space by Kato, and to the even larger space by Koch and Tataru, who also gave evidence that the latter space essentially the limit of the method; in even larger spaces such as the Besov space , there are constructions of Bourgain and Pavlovic that demonstrate ill-posedness in the sense of “norm inflation” – solutions that start from arbitrarily small norm data but end up being arbitrarily large in arbitrarily small amounts of time. (This grossly abbreviated history skips over dozens of other results, both positive and negative, in yet further function spaces, such as Morrey spaces or Besov spaces. See for instance the recent text of Lemarie-Rieusset for a survey.

Rather tham detail these other results, let us present instead a *scaling heuristic* which can be used to interpret these results (and can clarify why all the positive well-posedness results discussed here involve either “subcritical” or “critical” function spaces, rather than “supercritical” ones). For simplicity we restrict our discussion to the non-periodic setting , although the discussion here could also be adapted without much difficulty to the periodic setting (which effectively just imposes an additional constraint on the frequency parameter to be introduced below).

In this heuristic discussion, we assume that any given time , the velocity field is primarily located at a certain frequency (or equivalently, at a certain wavelength ) in the sense that the spatial Fourier transform is largely concentrated in the region . We also assume that at this time, the solution has an amplitude , in the sense that tends to be of order in magnitude in the region where it is concentrated. (We are deliberately leaving terms such as “concentrated” vague for the purposes of this discussion.) Using this ansatz, one can then heuristically compute the magnitude of various terms in the Navier-Stokes equations (1) or the projected version (33). For instance, if has ampltude and frequency , then should have amplitude (and frequency ), since the Laplacian operator multiplies the Fourier transform by ; one can also take a more “physical space” viewpoint and view the second derivatives in as being roughly like dividing out by the wavelength twice. Thus we see that the viscosity term in (1) or (33) should have size about . Similarly, the expression in (33) should have magnitude and frequency (or maybe slightly less due to cancellation), so and hence should have magnitude . The terms and in (1) can similarly be computed to have magnitude . Finally, if the solution oscillates (or blows up) in time in intervals of length (which one can think of as the natural time scale for the solution), then the term should have magnitude .

This leads to the following heuristics:

- If (or equivalently if ), then the viscosity term dominates the nonlinear terms in (1) or (33), and one should expect the Navier-Stokes equations to behave like the heat equation (23) in this regime. In particular solutions should exist and maintain (or even improve) their regularity as long as this regime persists. To balance the equation (1) or (33), one expects , so the natural time scale here is .
- If (or equivalently if ), then nonlinear effects dominate, and the behaviour is likely to be quite different to that of the heat equation. One now expects , so the natural time scale here is . In particular, one could theoretically have blowup or other bad behaviour after this time scale.

As a general rule of thumb, the known well-posedness theory for the Navier-Stokes equation is only applicable when the hypotheses on the initial data (and on the timescale being considered) is compatible either with the viscosity-dominated regime , or the time-limited regime . Outside of these regimes, we expect the evolution to be highly nonlinear in nature, and techniques such as the ones in this set of notes, which are primarily based on approximating the evolution by the linear heat flow, are not expected to apply.

Let’s discuss some of the results in this set of notes using these heuristics. Suppose we are given that the initial data is bounded in norm by some bound :

As in the above heuristics, we assume that exhibits some amplitude and frequency . Heuristically, the norm of should resemble times the norm of , which should be roughly , where is the volume of the region where is concentrated in. Thus we morally have a bound of the form

To use this bound, we invoke (at a heuristic level) the uncertainty principle , which indicates that the data should be spatially spread out at a scale of at least the wavelength , which implies that the volume should be at least . Thus we have

Suppose we have , then we have the crude bound

so we expect to have an amplitude bound . If we are in the nonlinear regime , this implies that , and so the natural time scale here is lower bounded by . This matches up with the local existence time given in Theorem 37 (or the non-periodic analogue of this theorem). However, the use of the crude bound (38) suggests that one can make improvements to this bound when is far from :

**Exercise 49** If , make a heuristic argument as to why the optimal lower bound for the time of existence for the Navier-Stokes equation in terms of the norm of the initial data should take the form

In a similar spirit, suppose we have the smallness hypothesis

on the critical norm , then a similar analysis to above leads to

and hence we will be in the viscosity dominated regime if is small enough, regardless of what time scale one uses; this is consistent with the global existence result in Theorem 45. On the other hand, if the norm is much larger than , then can be larger than , and we can fail to be in the viscosity dominated regime at any choice of frequency ; setting to be a large multiple of and sending to infinity, we see that the natural time scale could be arbitrarily small.

Finally, if one only controls a supercritical norm such as for some , this gives a bound on a quantity of the form , which allows one to leave the viscosity dominated regime (with plenty of room to spare) when is large, creating examples of initial data for which the natural time scale can be made arbitrarily small. As increases (restricting to, say, powers of two), the supercritical norm of these examples decays geometrically, so one can superimpose an infinite number of these examples together, leading to a choice of initial data with arbitrarily small supercritical norm for which the natural time scale is in fact zero. This strongly suggests that there is no good local well-posedness theory at such regularities.

**Exercise 50** Discuss the product estimate in Proposition 35, the Sobolev estimate in Exercise 36, and the energy estimates in Exercise 29(d) and Proposition 44 using the above heuristics.

**Remark 51** These heuristics can also be used to locate errors in many purported solutions to the Navier-Stokes global regularity problem that proceed through a sequence of estimates on a Navier-Stokes solution. At some point, the estimates have to rule out the scenario that the solution leaves the viscosity-dominated regime at larger and larger frequencies (and at smaller and smaller time scales ), with the time scales converging to zero to achieve a finite time blowup. If the estimates in the proposed solution are strong enough to heuristically rule out this scenario by the end of the argument, but not at the beginning of the argument, then there must be some step inside the argument where one moves from “supercritical” estimates that are too weak to rule out this scenario, to “critical” or “subcritical” estimates which are capable of doing so. This step is often where the error in the argument may be found.

The above heuristics are closely tied to the classification of various function space norms as being “subcritical”, “supercritical”, or “critical”. Roughly speaking, a norm is subcritical if bounding that norm heuristically places one in the linear-dominated regime (which, for Navier-Stokes, is the viscosity-dominated regime) at high frequencies; critical if control of the norm very nearly places one in the linear-dominated regime at high frequencies; and supercritical if control of the norm completely fails to place one in the linear-dominated regime at high frequencies. When the equation in question enjoys a scaling symmetry, the distinction between subcritical, supercritical, and critical norms can be made by seeing how the the top-order component of these norms vary with respect to scaling a function to be high frequency. In the case of the Navier-Stokes equations (1), the scaling is given by the formulae

with the initial data similarly being scaled to

Here is a scaling parameter; as , the functions are being sent to increasingly fine scales (i.e., high frequencies). One easily checks that if solves the Navier-Stokes equations (1) with initial data , then solves the same equations with initial data ; similarly for other formulations of the Navier-Stokes equations such as (33) or (34). (In terms of the parameters from the previous heuristic discussion, this scaling corresponds to the map .)

Typically, if one considers a function space norm of (or of or ) in the limit , the top order behaviour will be given by some power of . A norm is called *subcritical* if the exponent is positive, *supercritical* if the exponent is negative, and *critical* if the exponent is zero. For instance, one can calculate the Fourier transform

and hence

As , this expression behaves like to top order; hence the norm is subcritical when , supercritical when , and critical when .

Another way to phrase this classification is to use dimensional analysis. If we use to denote the unit of length, and the unit of time, then the velocity field should have units , and the terms and in (1) then have units . To be dimensionally consistent, the kinematic viscosity must then have the units , and the pressure should have units . (This differs from the usual units given in physics to the pressure, which is where is the unit of mass; the discrepancy comes from the choice to normalise the density, which usually has units , to equal .) If we fix to be a dimensionless constant such as , this forces a relation between the time and length units, so now and have the units and respectively (compare with (39) and (40)). Of course will then also have units . One can then declare a function space norm of , , or to be subcritical if its top order term has units of a negative power of , supercritical if this is a positive power of , and critical if it is dimensionless. For instance, the top order term in is the norm of ; as has the units of , and Lebesgue measure has the units of , we see that has the units of , giving the same division into subcritical, supercritical, and critical spaces as before.

** — 6. Appendix: some Littlewood-Paley theory — **

We now prove Proposition 35. By a limiting argument it suffices to establish the claim for smooth . The claim is immediate from Hölder’s inequality when , so we will assume . For brevity we shall abbreviate as , and similarly for , etc..

We use the technique of Littlewood-Paley projections. Let be an even bump function (depending only on ) that equals on and is supported on ; for the purposes of asymptotic notation, any bound that depends on can thus be thought of as depending on instead. For any dyadic integer (by which we mean an integer that is a power of ), define the *Littlewood-Paley projections* on periodic smooth functions by the formulae

and (for )

so one has the *Littlewood-Paley decomposition*

Here and in the sequel is always understood to be restricted to be a dyadic integer.

The key point of this decomposition is that the and Sobolev norms of the individual components of this decomposition are easier to estimate than the original function . The following estimates in particular will suffice for our applications:

**Exercise 52 (Basic Littlewood-Paley estimates)**

- (a) For any dyadic integer , show that
where is the inverse Fourier transform of . In particular if is real-valued then so is and . Conclude the

*Bernstein inequality*for all smooth functions , all and ; in particular

By the triangle inequality, the same estimates also hold for , .

- (b) For any , show that

**Remark 53** The more advanced Littlewood-Paley inequality, which is usually proven using the Calderón-Zygmund theory of singular integrals, asserts that

for any . However, we will not use this estimate here.

We return now to the proof of Proposition 35. Let be as above. By Exercise 52, it suffices to establish the bounds

The estimate (41) follows by dropping (using Exercise 52) and applying Hölder’s inequality, so we turn to (42). We may restrict attention to those terms where (say) since the other terms can be treated by the same argument used to prove (41).

The basic strategy here is to split the product (or the component of this product) into paraproducts in which some constraint is imposed between the frequencies of the and terms. There are many ways to achieve this splitting; we will use

By the triangle inequality, it suffices to show the estimates

We begin with (43). We can expand further

The key point now is that (by inspecting the Fourier series expansions) the first term on the RHS vanishes, and the summands in the second term also vanish unless . Thus

and the claim follows by summing in , interchanging the summations, and using Exercise 52. Now we prove (44). We bound

using Cauchy-Schwarz, and the claim again follows by summing in , interchanging the summations, and using Exercise 52.

There is an essentially identical theory in the non-periodic setting, in which the role of smooth periodic functions are now replaced by Schwartz functions, the Littlewood-Paley projections are now defined as

and is defined as before.

**Exercise 54 (Non-periodic Littlewood-Paley theory)** With now denoting instead of , and similarly for other function spaces, establish the non-periodic analogue of Exercise 52 for Schwartz functions .

In particular, one obtains the non-periodic analogue of Proposition 35 by repeating the proof verbatim.

### Benoist’s minicourse “Arithmeticity of discrete groups”. Lecture IV: Other examples

In this last post of this series, we want to complete the discussion of Oh–Benoist–Miquel theorem by giving a sketch of its proof in the cases not covered in previous posts.

More precisely, let us remind that Oh–Benoist–Miquel theorem (answering a conjecture of Margulis) asserts that:

**Theorem 1** *Let be a semisimple algebraic Lie group of real rank . Denote by a horospherical subgroup of . If is a discrete Zariski-dense and irreducible subgroup such that is cocompact, then is commensurable to an arithmetic lattice .*

Moreover, we remind that the proof of this result was worked out in the previous two posts of this series for and . Furthermore, we observed *en passant* that these arguments can be generalized without too much effort to yield a proof of Theorem 1 *when*

- is commutative;
- is reflexive (i.e., is conjugate to an opposite horospherical subgroup );
- is not compact (where , , and ).

Today, we will divide our discussion below into five sections discussing prototypical examples covering all possible remaining cases for and .

**Remark 1** *The fact that Theorem 1 holds for the examples in Sections 1, 2 and 3 below is originally due to Oh. Similarly, the example in Section 4 was originally treated by Selberg. Finally, the original proof of Theorem 1 for the example in Section 5 is due to Benoist–Oh. Nevertheless, expect for Section 3, the arguments discussed below are some particular examples illustrating the general strategy of Benoist–Miquel and, hence, they provide some proofs which are different from the original ones.*

**1. is not reflexive**

The prototype example of this case is and .

The corresponding parabolic subgroup is the stabilizer of the line :

Equivalently, is the stabilizer of the flag . Therefore, is *not* reflexive because its opposite is the stabilizer of a *plane*.

Since is Zariski-dense in , we can find such that is a basis of . Hence, there is no loss in generality in assuming that and . In this setting,

Also, we know that and are compact. Moreover, is a discrete and Zariski dense subgroup of the semi-direct product

of and .

In this context, a key fact is the following result of Auslander (compare with Proposition 4.17 in Benoist–Miquel paper):

**Theorem 2 (Auslander)** *Let be an algebraic subgroup obtained from a semi-direct product of semisimple and solvable, and denote by the natural projection. If is discrete and Zariski dense, then is also discrete and Zariski dense in .*

The information about the discreteness of the projection in the previous statement is extremely precious for our purposes. Indeed, Auslander theorem implies that the projections and are discrete. Using these facts, one checks that

By repeating this argument with and in the place of , one can “fill all non-diagonal entries”, that is, one essentially gets that contains finite-index subgroups of

so that Raghunathan–Venkataramana–Oh theorem (stated in the previous post of this series) guarantees that is commensurable to .

This completes our sketch of proof of Theorem 1 for our prototype of non-reflexive subgroup above.

**2. is Heisenberg and is not compact**

A *Heisenberg* horospherical subgroup is a -step nilpotent whose associated parabolic group acts by similarities (of some Euclidean norm) on the center of the Lie algebra of .

A prototypical example of Heisenberg and non-compact is and

As it turns out, any Heisenberg is reflexive. Thus, we have that is opposite to for some adequate choice .

In particular, it is tempting to mimmic the arguments from the second and third posts of this series, namely, one introduces the lattices

so that the arithmeticity of follows from the closedness of the -orbit of in when is not compact; moreover, the closedness of is basically a consequence of the closedness and discreteness of in for an appropriate choice of polynomial function .

In the case of *commutative*, we took , where , and was the natural projection with respect to the decomposition .

As it turns out, the case of Heisenberg can be dealt with by slightly modifying the construction in the previous paragraph. More precisely, one considers a natural graduation

and one sets , , , and is the natural projection . In our prototypical example, the polynomial function is very explicit:

This completes our sketch of proof of Theorem 1 when is Heisenberg and is not compact.

**3. is not commutative and is not Heisenberg**

Our prototype of non-commutative and non-Heisenberg is the subgroup

of .

In this context, we will explore some well-known results from the theory of lattices in nilpotent groups to reduce our task to the case of commutative and reflexive.

More concretely, the properties of nilpotent groups together with our hypothesis that is a lattice in allow to conclude that is a lattice in

and, consequently, the centralizer of in is a lattice in the centralizer

of in . Therefore, we reduced matters to the case of commutative and reflexive which was discussed in the previous two posts of this series.

In particular, our sketch of proof of Theorem 1 when is non-commutative and non-Heisenberg is complete.

**4. commutative and is compact**

The basic example of commutative and compact is and

In this setting, we consider

the common Levi subgroup of and the parabolic subgroup normalizing an opposite of , and the “unimodular Levi subgroup”

The discussion in the second post of this series ensures that the -orbit of is closed in .

We affirm that is *compact*. Indeed, this fact can be proved via Mahler’s compactness criterion: more concretely, recall from the second post of this series that the proof of the closedness of produced a polynomial on which is -invariant and whose values on form a closed and discrete subset of ; in our prototypical example, a direct computation shows that

in particular, ; therefore, the -invariance of together with the closedness and discreteness of imply that

since is *irreducible*, , and, *a fortiori*, there are no arbitrarily short non-trivial vectors in the closed family of lattices ; hence, we can apply Mahler’s compactness criterion to complete the proof of our affirmation.

At this point, we observe that is not compact (because ), so that the compactness of means that the stabilizer of this orbit is infinite. Consequently, is infinite, and a quick inspection of the previous post reveals that this is *precisely* the information needed to apply Margulis’ construction of -forms and Raghunathan–Venkataramana theorem in order to derive the arithmeticity of . This completes our sketch of proof of Theorem 1 when is commutative and is compact (and the reader is invited to consult Section 4.6 of Benoist–Miquel paper for more details).

**5. is Heisenberg and compact**

Closing this series of post, let us discuss the remaining case of Heisenberg and compact. A concrete example of this situation is and

In this context, and an unimodular Levi subgroup is

Once again, let us recall that we know that is closed, where

We affirm that there is no loss of generality in assuming that and for all . Indeed, if this is not the case (say for some ), then we are back to the setting of Section 1 above (of the horospherical subgroup ).

Here, we can derive the arithmeticity of along the same lines in Section 4 above (where it sufficed to study an appropriate polynomial to employ Mahler’s compactness criterion). More precisely, one uses the fact that and for all to prove that is compact, so that is infinite and, thus, by Margulis’ construction of -forms and Raghunathan–Venkataramana theorem, is arithmetic.

### Additional thoughts on the Ted Hill paper

First, I’d like to thank the large number of commenters on my previous post for keeping the discussion surprisingly calm and respectful given the topic discussed. In that spirit, and to try to practise the scientific integrity that I claimed to care about, I want to acknowledge that my views about the paper have changed somewhat as a result of the discussion. My understanding of the story of what happened to the paper has changed even more now that some of those attacked in Ted Hill’s Quillette article have responded, but about that I only want to repeat what I said in one or two comments on the previous post: that my personal view is that one should not “unaccept” or “unpublish” a paper unless something was improper about the way it was accepted or published, and that that is also the view of the people who were alleged to have tried to suppress Ted Hill’s paper on political grounds. I would also remark that whatever happened at NYJM would not have happened if all decisions had to be taken collectively by the whole editorial board, which is the policy on several journals I have been on the board of. According to Igor Rivin, the policy at NYJM is very different: “No approval for the full board is required, or ever obtained. The approval of the Editor in Chief is not required.” I find this quite extraordinary: it would seem to be a basic safeguard that decisions should be taken by more than one person — ideally many more.

To return to the paper, I now see that the selectivity hypothesis, which I said I found implausible, was actually quite reasonable. If you look carefully at my previous post, you will see that I actually started to realize that even when writing it, and it would have been more sensible to omit that criticism entirely, but by the time it occurred to me that ancient human females could well have been selective in a way that could (in a toy model) be reasonably approximated by Hill’s hypothesis, I had become too wedded to what I had already written — a basic writer’s mistake, made in this case partly because I had only a short window of time in which to write the post. I’m actually quite glad I left the criticism in, since I learnt quite a lot from the numerous comments that defended the hypothesis.

I had a similar experience with a second criticism: the idea of dividing the population up into two subpopulations. That still bothers me somewhat, since in reality we all have large numbers of genes that interact in complicated ways and it is not clear that a one-dimensional model will be appropriate for a high-dimensional feature space. But perhaps for a toy model intended to start a discussion that is all right.

While I’m at it, some commenters on the previous post came away with the impression that I was against toy models. I agree with the following words, which appeared in a book that was published in 2002.

There are many ways of modelling a given physical situation and we must use a mixture of experience and further theoretical considerations to decide what a given model is likely to teach us about the world itself. When choosing a model, one priority is to make its behaviour correspond closely to the actual, observed behaviour of the world. However, other factors, such as simplicity and mathematical elegance, can often be more important. Indeed, there are very useful models with almost no resemblance to the world at all …

But that’s not surprising, since I was the author of the book.

But there is a third feature of Hill’s model that I still find puzzling. Some people have tried to justify it to me, but I found that either I understood the justifications and found them unconvincing or I didn’t understand them. I don’t rule out the possibility that some of the ones I didn’t understand were reasonable defences of this aspect of the model, but let me lay out once again the difficulty I have.

To do this I’ll briefly recall Hill’s model. You have two subpopulations and of, let us say, the males of a species. (It is not important for the model that they are male, but that is how Hill hopes the model will be applied.) The distribution of desirability of subpopulation is more spread out than that of subpopulation , so if the females of the species choose to reproduce only with males above a rather high percentile of desirability, they will pick a greater proportion of subpopulation than of subpopulation .

A quick aside is that what I have just written is more or less the entire actual content (as opposed to surrounding discussion) of Hill’s paper. Of course, he has to give a precise definition of “more spread out”, but it is very easy to come up with a definition that will give the desired conclusion after a one-line argument, and that is what he does. He also gives a continuous-time version of the process. But I’m not sure what adding a bit of mathematical window dressing really adds, since the argument in the previous paragraph is easy to understand and obviously correct. But of course without that window dressing the essay couldn’t hope to sell itself as a mathematics paper.

The curious feature of the model, and the one that I still find hard to accept, is that Hill assumes, and absolutely needs to assume, that the only thing that can change is the sizes of the subpopulations and not the distributions of desirability within those populations. So if, for example, what makes a male desirable is height, and if the average heights in the two populations are the same, then even though females refuse to reproduce with anybody who isn’t unusually tall, the average height of males remains the same.

The only way this strange consequence can work, as far as I can see, is if instead of there being a gene (or combination of genes) that makes men tall, there is a gene that has some complicated effect of which a side-effect is that the height of men is more variable, and moreover there aren’t other genes that simply cause tallness.

It is hard to imagine what the complicated effect might be in the case of height, but it is not impossible to come up with speculations about mathematical ability. For example, maybe men have, as has been suggested, a tendency to be a bit further along the autism spectrum than women, which causes some of them to become very good at mathematics and others to lack the social skills to attract a mate. But even by the standards of evolutionary just-so stories, that is not a very good one. Our prehistoric ancestors were not doing higher mathematics, so we would need to think of some way that being on the spectrum could have caused a man *at that time* to become highly attractive to women. One has to go through such contortions to make the story work, when all along there is the much more straightforward possibility that there is some complex mix of genes that go towards making somebody intelligent, and that if prehistoric women went for intelligent men, then those genes would be selected for. But if that is what happened, then the proportion of less intelligent men would go down, and therefore the variability would go down.

While writing this, I have realized that there is a crucial assumption of Hill’s, the importance of which I had not appreciated. It’s that the medians of his two subpopulations are the same. Suppose instead that the individuals in male population are on average more desirable than the individuals in male population . Then even if population is *less* variable than population , if females are selective, it may very well be that a far higher proportion of population is chosen than of population , and therefore a tendency for the variability of the combined population to decrease. In fact, we don’t even need to assume that is less variable than : if the population as a whole becomes dominated by , it may well be less variable than the original combination of populations and .

So for Hill’s model to work, it needs a fairly strange and unintuitive combination of hypotheses. Therefore, if he proposes it as a potential explanation for greater variability amongst males, he needs to argue that this combination of hypotheses might actually have occurred for many important features. For example, if it is to explain greater variability for males in mathematics test scores, then he appears to need to argue (i) that there was a gene that made our prehistoric male ancestors more variable with respect to some property that at one end of the scale made them more desirable to females, (ii) that this gene had no effect on average levels of desirability, (iii) that today this curious property has as a side-effect greater variability in mathematics test scores, and (iv) this tendency to increase variability is not outweighed by reduction of variability due to selection of other genes that do affect average levels. (Although he explicitly says that he is not trying to explain any particular instance of greater variability amongst males, most of the references he gives concerning such variability are to do with intellectual ability, and if he can’t give a convincing story about that, then why have all those references?)

Thus, what I object to is not the very idea of a toy model, but more that with this particular toy model I have to make a number of what seem to me to be highly implausible assumptions to get it to work. And I don’t mean the usual kind of entirely legitimate simplifying assumptions. Rather, I’m talking about artificial assumptions that seem to be there only to get the model to do what Hill wants it to do. If some of the hypotheses above that seem implausible to me have in fact been observed by biologists, it seems to me that Hill should have included references to the relevant literature in his copious bibliography.

As with my previous post, I am not assuming that everything I’ve just written is right, and will be happy to be challenged on the points above.

### On the recently removed paper from the New York Journal of Mathematics

In the last week or so there has been some discussion on the internet about a paper (initially authored by Hill and Tabachnikov) that was initially accepted for publication in the Mathematical Intelligencer, but with the editor-in-chief of that journal later deciding against publication; the paper, in significantly revised form (and now authored solely by Hill), was then quickly accepted by one of the editors in the New York Journal of Mathematics, but then was removed from publication after objections from several members on the editorial board of NYJM that the paper had not been properly refereed or was within the scope of the journal; see this statement by Benson Farb, who at the time was on that board, for more details. Some further discussion of this incident may be found on Tim Gowers’ blog; the most recent version of the paper, as well as a number of prior revisions, are still available on the arXiv here.

For whatever reason, some of the discussion online has focused on the role of Amie Wilkinson, a mathematician from the University of Chicago (and who, incidentally, was a recent speaker here at UCLA in our Distinguished Lecture Series), who wrote an email to the editor-in-chief of the Intelligencer raising some concerns about the content of the paper and suggesting that it be published alongside commentary from other experts in the field. (This, by the way, is not uncommon practice when dealing with a potentially provocative publication in one field by authors coming from a different field; for instance, when Emmanuel Candès and I published a paper in the Annals of Statistics introducing what we called the “Dantzig selector”, the Annals solicited a number of articles discussing the selector from prominent statisticians, and then invited us to submit a rejoinder.) It seems that the editors of the Intelligencer decided instead to reject the paper. The paper then had a complicated interaction with NYJM, but, as stated by Wilkinson in her recent statement on this matter as well as by Farb, this was done without any involvement from Wilkinson. (It is true that Farb happens to also be Wilkinson’s husband, but I see no reason to doubt their statements on this matter.)

I have not interacted much with the Intelligencer, but I have published a few papers with NYJM over the years; it is an early example of a quality “diamond open access” mathematics journal. It seems that this incident may have uncovered some issues with their editorial procedure for reviewing and accepting papers, but I am hopeful that they can be addressed to avoid this sort of event occurring again.

### The structure of correlations of multiplicative functions at almost all scales, with applications to the Chowla and Elliott conjectures

Joni Teräväinen and I have just uploaded to the arXiv our paper “The structure of correlations of multiplicative functions at almost all scales, with applications to the Chowla and Elliott conjectures“. This is a sequel to our previous paper that studied logarithmic correlations of the form

where were bounded multiplicative functions, were fixed shifts, was a quantity going off to infinity, and was a generalised limit functional. Our main technical result asserted that these correlations were necessarily the uniform limit of periodic functions . Furthermore, if (weakly) pretended to be a Dirichlet character , then the could be chosen to be –isotypic in the sense that whenever are integers with coprime to the periods of and ; otherwise, if did not weakly pretend to be any Dirichlet character , then vanished completely. This was then used to verify several cases of the logarithmically averaged Elliott and Chowla conjectures.

The purpose of this paper was to investigate the extent to which the methods could be extended to non-logarithmically averaged settings. For our main technical result, we now considered the unweighted averages

where is an additional parameter. Our main result was now as follows. If did not weakly pretend to be a twisted Dirichlet character , then converged to zero on (doubly logarithmic) average as . If instead did pretend to be such a twisted Dirichlet character, then converged on (doubly logarithmic) average to a limit of -isotypic functions . Thus, roughly speaking, one has the approximation

for most .

Informally, this says that at almost all scales (where “almost all” means “outside of a set of logarithmic density zero”), the non-logarithmic averages behave much like their logarithmic counterparts except for a possible additional twisting by an Archimedean character (which interacts with the Archimedean parameter in much the same way that the Dirichlet character interacts with the non-Archimedean parameter ). One consequence of this is that most of the recent results on the logarithmically averaged Chowla and Elliott conjectures can now be extended to their non-logarithmically averaged counterparts, so long as one excludes a set of exceptional scales of logarithmic density zero. For instance, the Chowla conjecture

is now established for either odd or equal to , so long as one excludes an exceptional set of scales.

In the logarithmically averaged setup, the main idea was to combine two very different pieces of information on . The first, coming from recent results in ergodic theory, was to show that was well approximated in some sense by a nilsequence. The second was to use the “entropy decrement argument” to obtain an approximate isotopy property of the form

for “most” primes and integers . Combining the two facts, one eventually finds that only the almost periodic components of the nilsequence are relevant.

In the current situation, each is approximated by a nilsequence, but the nilsequence can vary with (although there is some useful “Lipschitz continuity” of this nilsequence with respect to the parameter). Meanwhile, the entropy decrement argument gives an approximation basically of the form

for “most” . The arguments then proceed largely as in the logarithmically averaged case. A key lemma to handle the dependence on the new parameter is the following cohomological statement: if one has a map that was a quasimorphism in the sense that for all and some small , then there exists a real number such that for all small . This is achieved by applying a standard “cocycle averaging argument” to the cocycle .

It would of course be desirable to not have the set of exceptional scales. We only know of one (implausible) scenario in which we can do this, namely when one has far fewer (in particular, subexponentially many) sign patterns for (say) the Liouville function than predicted by the Chowla conjecture. In this scenario (roughly analogous to the “Siegel zero” scenario in multiplicative number theory), the entropy of the Liouville sign patterns is so small that the entropy decrement argument becomes powerful enough to control all scales rather than almost all scales. On the other hand, this scenario seems to be self-defeating, in that it allows one to establish a large number of cases of the Chowla conjecture, and the full Chowla conjecture is inconsistent with having unusually few sign patterns. Still it hints that future work in this direction may need to split into “low entropy” and “high entropy” cases, in analogy to how many arguments in multiplicative number theory have to split into the “Siegel zero” and “no Siegel zero” cases.

### Benoist’s minicourse “Arithmeticity of discrete groups”. Lecture III: From closedness to arithmeticity

Recall that the main goal of this series of posts is the proof of the following result:

**Theorem 1** *Let be a semisimple algebraic Lie group of real rank . Denote by a horospherical subgroup of . If is a discrete Zariski-dense and irreducible subgroup such that is cocompact, then is commensurable to an arithmetic lattice .*

Last time, we discussed the first half of the proof of this theorem in the particular case of , , and . Actually, we saw that this specific form of is not very important: all results from the previous post hold whenever

- is
*reflexive*: in the context of the example above, this is the fact that is conjugate to the opposite horospherical subgroup ; - is
*commutative*.

Indeed, we observed that reflexive allows to also assume that is cocompact in . Then, this property and the commutativity of were exploited to establish the closedness of the -orbit of in , where , , and

is the common Levi subgroup of the parabolic subgroups and normalizing and .

Today, we will discuss the second half of the proof of Theorem 1 in the particular case of , , and : in other terms, our goal below is to obtain the arithmeticity of from the closedness of in the homogenous space . This step is due to Hee Oh (see Proposition 3.4.4 of her paper).

**1. From closedness to infinite stabilizer**

Let and , so that . In particular, the closedness of implies that

is closed in .

The next proposition asserts that the stabilizer of this orbit is large whenever is not compact:

**Proposition 2** *The stabilizer is a lattice in .*

This proposition is a direct consequence of the closedness of the in and the following general fact:

**Proposition 3** *Let be a Lie group, a lattice in , and . Suppose that is a semisimple subgroup with finite center such that is closed, then is a lattice in .*

*Proof:* The first ingredient of the argument is *Howe–Moore’s mixing theorem*: it asserts that if is a semisimple group with finite center and is an unitary representation of with , then

for all . (Here, means that the projection of to any simple factor of diverges.)

The second ingredient of the argument is *Dani–Margulis recurrence theorem*: it says that if is a lattice in a Lie group and is a one-parameter unipotent subgroup of , then, given and , there exists a compact subset such that

for all .

The basic idea to obtain the desired proposition is to apply these ingredients to , where is a -invariant measure on , and is a one-parameter unipotent subgroup in the product of non-compact simple factors of . Here, we observe that is a *bona fide* Radon measure because we are assuming that is closed, and, if we take not contained in proper normal subgroups of , then as thanks to the absence of compact factors in . In this setting, our task is reduced to prove that is a *finite* measure.

In this direction, we apply Dani–Margulis recurrent theorem to get with and a compact subset such that

for all and . In this way, we obtain that the characteristic functions and of and are two elements of with , and, hence, by Fubini’s theorem,

for all . It follows from Howe–Moore’s mixing theorem that

that is, there exists a non-zero function which is -invariant. By averaging over the product of the compact factors of if necessary, we obtain a non-zero function which is -invariant. By ergodicity of , we have that is constant and, *a fortiori*, is a finite measure.

**2. From infinite stabilizer to infinite**

Our previous discussions (about -actions) paved the way to understand . Intuitively, it is important to get some information about in our way towards showing the arithmeticity of because we already know that and are lattices (i.e., projects to lattices in “other directions”).

The intuition in the previous paragraph is confirmed by *Margulis construction of -forms*:

**Theorem 4 (Margulis)** *If is infinite, then is contained in some -form of .*

We will discuss the proof of this result in the next section. For now, we want to exploit the information in order to derive the arithmeticity of . For this sake, we invoke the following result:

**Theorem 5 (Raghunathan-Venkataramana)** *Assume that is semisimple of defined over , and is -simple. Suppose that and are opposite horospherical subgroups defined over .**If is a subgroup such that , resp. , has finite index in , resp. , then has finite index in .*

**Remark 1** *As it was kindly pointed out to me by David Fisher, Raghunathan-Venkataramana theorem is due to Margulis in the cases covered by Raghunathan (at least).*

**Remark 2** *This result is false when , e.g., (and ).*

Roughly speaking, Raghunathan-Venkataramana theorem essentially establishes the desired Theorem 1 provided we know in advance that .

In the sequel, we will treat Raghunathan-Venkataramana theorem as a *blackbox* and we will complete the proof of Theorem 1 in the case reflexive and commutative, and *non-compact* such as and .

**Remark 3** *In some natural situations (e.g., subgroups generated by the matrices of the so-called Kontsevich–Zorich cocycle) we have that . In particular, it is a pity that the lack of time made that Yves Benoist could not explain to me the proof of Raghunathan-Venkataramana theorem. Anyhow, I hope to come back to discuss this point in more details in the future.*

*Proof of Theorem 1:* We consider the subgroup of . It is discrete and Zariski dense in . Therefore, the normalizer is Zariski dense in , and it is not hard to check that it is also discrete.

Observe that Proposition 2 says that is a lattice in (because ). Since is non-compact, we have that is infinite. Thus, Margulis’ Theorem 4 implies that for some -form of .

Note also that and are defined over : in fact, if is an algebraic subgroup such that is Zariski dense in , then is defined over .

Hence, we can apply Raghunathan-Venkataramana Theorem 5 to get that is commensurable to . Since , this proves the arithmeticity of .

**Remark 4** *As we noticed above, the arguments presented so far allows to prove Theorem 1 when is reflexive and commutative, and is non-compact.*

**3. From infinite to arithmeticity**

In this (final) section (of this post), we discuss some steps in the proof of Margulis Theorem 4 stated above.

We write , i.e., we decompose the Lie algebra of in terms of the Lie algebras of , and .

Our goal is to find a -form of containing . For this sake, let us do some “reverse engineering”: assuming that we found , what are the constraints satisfied by the -structure on its Lie algebra?

First, we note that we dispose of lattices and . Hence, we are “forced” to define and as the -vector spaces spanned by and .

Next, we observe that the choice of above imposes a natural -structure on via the adjoint map . In fact, is defined over because we know that is a lattice (and, hence, Zariski-dense) in and (by definition).

Finally, since , and are already defined over and we want a -structure on , it remains to put a -structure on . In general this is not difficult: for instance, we can take

in the context of the example and .

Once we understood the constraints on a -form of containing , we can work backwards and set

where , , and are the -structures from the previous paragraphs.

At this point, the proof of Theorem 4 is complete once we show the following facts:

- doesn’t depend on the choices (of , etc.);
- for all ;
- is a Lie algebra.

The proof of these statements is described in the proof of Proposition 4.11 of Benoist–Miquel paper. Closing this post, let us just make some comments on the independence of on the choices. For this sake, suppose that is another choice of horospherical subgroup. Denote by its Lie algebra, and let be the associated parabolic subgroup. Our task is to verify that the Lie algebra is defined over . In this direction, the basic strategy is to reduce to the case when is opposite to and in order to get . Finally, during the implementation of this strategy, one relies on the following properties discussed in Lemma 4.8 of Benoist–Miquel article of the action of the *unimodular normalizers* of horospherical subgroups on the space with basepoint :

- If is compact, then is closed;
- If and are closed, then is closed;
- If is compact and is closed, then has finite index in .

In any event, this completes our discussion of Theorem 4. In particular, we gave a (sketch of) proof of Theorem 1 when is commutative and reflexive, and is non-compact (cf. Remark 4).

Next time, we will establish Theorem 1 in the remaining cases of and .

### Has an uncomfortable truth been suppressed?

**Update to post, added 11th September.** As expected, there is another side to the story discussed below. See this statement about the decision by the Mathematical Intelligencer and this one about the decision taken by the New York Journal of Mathematics.

**Further update, added 15th September.** The author has also made a statement.

I was disturbed recently by reading about an incident in which a paper was accepted by the Mathematical Intelligencer and then rejected, after which it was accepted and published online by the New York Journal of Mathematics, where it lasted for three days before disappearing and being replaced by another paper of the same length. The reason for this bizarre sequence of events? The paper concerned the “variability hypothesis”, the idea, apparently backed up by a lot of evidence, that there is a strong tendency for traits that can be measured on a numerical scale to show more variability amongst males than amongst females. I do not know anything about the quality of this evidence, other than that there are many papers that claim to observe greater variation amongst males of one trait or another, so that if you want to make a claim along the lines of “you typically see more males both at the top and the bottom of the scale” then you can back it up with a long list of citations.

You can see, or probably already know, where this is going: some people like to claim that the reason that women are underrepresented at the top of many fields is simply that the top (and bottom) people, for biological reasons, tend to be male. There is a whole narrative, much loved by many on the political right, that says that this is an uncomfortable truth that liberals find so difficult to accept that they will do anything to suppress it. There is also a counter-narrative that says that people on the far right keep on trying to push discredited claims about the genetic basis for intelligence, differences amongst various groups, and so on, in order to claim that disadvantaged groups are innately disadvantaged rather than disadvantaged by external circumstances.

I myself, as will be obvious, incline towards the liberal side, but I also care about scientific integrity, so I felt I couldn’t just assume that the paper in question had been rightly suppressed. I read an article by the author that described the whole story (in Quillette, which rather specializes in this kind of story), and it sounded rather shocking, though one has to bear in mind that since the article is written by a disgruntled author, there is almost certainly another side to the story. In particular, he is at pains to stress that the paper is simply a mathematical theory to explain why one sex might evolve to become more variable than another, and not a claim that the theory applies to any given species or trait. In his words, “Darwin had also raised the question of why males in many species might have evolved to be more variable than females, and when I learned that the answer to his question remained elusive, I set out to look for a scientific explanation. My aim was not to prove or disprove that the hypothesis applies to human intelligence or to any other specific traits or species, but simply to discover a logical reason that could help explain how gender differences in variability might naturally arise in the same species.”

So as I understood the situation, the paper made no claims whatsoever about the real world, but simply defined a mathematical model and proved that *in this model* there would be a tendency for greater variability to evolve in one sex. Suppressing such a paper appeared to make no sense at all, since one could always question whether the model was realistic. Furthermore, suppressing papers on this kind of topic simply plays into the hands of those who claim that liberals are against free speech, that science is not after all objective, and so on, claims that are widely believed and do a lot of damage.

I was therefore prompted to look at the paper itself, which is on the arXiv, and there I was met by a surprise. I was worried that I would find it convincing, but in fact I found it so unconvincing that I think it was a bad mistake by Mathematical Intelligencer and the New York Journal of Mathematics to accept it, but for reasons of mathematical quality rather than for any controversy that might arise from it. To put that point more directly, if somebody came up with a plausible model (I don’t insist that it should be clearly correct) and showed that subject to certain assumptions about males and females one would expect greater variability to evolve amongst males, then that might well be interesting enough to publish, and certainly shouldn’t be suppressed just because it might be uncomfortable, though for all sorts of reasons that I’ll discuss briefly later, I don’t think it would be as uncomfortable as all that. But this paper appears to me to fall well short of that standard.

To justify this view, let me try to describe what the paper does. Its argument can be summarized as follows.

1. Because in many species females have to spend a lot more time nurturing their offspring than males, they have more reason to be very careful when choosing a mate, since a bad choice will have more significant consequences.

2. If one sex is more selective than the other, then the less selective sex will tend to become more variable.

To make that work, one must of course define some kind of probabilistic model in which the words “selective” and “variable” have precise mathematical definitions. What might one expect these to be? If I hadn’t looked at the paper, I think I’d have gone for something like this. An individual of one sex will try to choose as desirable a mate as possible amongst potential mates that would be ready to accept as a mate. To be more selective would simply mean to make more of an effort to optimize the mate, which one would model in some suitable probabilistic way. One feature of this model would presumably be that a less attractive individual would typically be able to attract less desirable mates.

I won’t discuss how variability is defined, except to say that the definition is, as far as I can see, reasonable. (For normal distributions it agrees with standard deviation.)

The definition of selectivity in the paper is extremely crude. The model is that individuals of one sex will mate with individuals of the other sex if and only if they are above a certain percentile in the desirability scale, a percentile that is the same for everybody. For instance, they might only be prepared to choose a mate who is in the top quarter, or the top two thirds. The higher the percentile they insist on, the more selective that sex is.

When applied to humans, this model is ludicrously implausible. While it is true that some males have trouble finding a mate, the idea that some huge percentage of males are simply not desirable enough (as we shall see, the paper requires this percentage to be over 50) to have a chance of reproducing bears no relation to the world as we know it.

I suppose it is just about possible that an assumption like this could be true of some species, or even of our cave-dwelling ancestors — perhaps men were prepared to shag pretty well anybody, but only some small percentage of particularly hunky men got their way with women — but that isn’t the end of what I find dubious about the paper. And even if we were to accept that something like that had been the case, it would be a huge further leap to assume that what made somebody desirable hundreds of thousands of years ago was significantly related to what makes somebody good at, say, mathematical research today.

Here is one of the main theorems of the paper, with a sketch of the proof. Suppose you have two subpopulations and within one of the two sexes, with being of more varied attractiveness than . And suppose that the selectivity cutoff for the other sex is that you have to be in the top 40 percent attractiveness-wise. Then because is more concentrated on the extremes than , a higher proportion of subpopulation will be in that percentile. (This can easily be made rigorous using the notion of variability in the paper.) By contrast, if the selectivity cutoff is that you have to be in the top 60 percent, then a higher proportion of subpopulation will be chosen.

I think we are supposed to conclude that subpopulation is therefore favoured over subpopulation when the other sex is selective, and not otherwise, and therefore that variability amongst males tends to be selected for, because females tend to be more choosy about their mates.

But there is something very odd about this. Those poor individuals at the bottom of population aren’t going to reproduce, so won’t they die out and potentially cause population to become *less* variable? Here’s what the paper has to say.

Thus, in this discrete-time setting, if one sex remains selective from each generation to the next, for example, then in each successive generation more variable subpopulations of the opposite sex will prevail over less variable subpopulations with comparable average desirability. Although the desirability distributions themselves may evolve, if greater variability prevails at each step, that suggests that over time the opposite sex will tend toward greater variability.

Well I’m afraid that to me it doesn’t suggest anything of the kind. If females have a higher cutoff than males, wouldn’t that suggest that males would have a much higher selection pressure to become more desirable than females? And wouldn’t the loss of all those undesirable males mean that there wasn’t much one could say about variability? Imagine for example if the individuals in were all either extremely fit or extremely unfit. Surely the variability would go right down if only the fit individuals got to reproduce. And if you’re worrying that the model would in fact show that males would tend to become far superior to females, as opposed to the usual claim that males are more spread out both at the top and at the bottom, let’s remember that males inherit traits from both their fathers and their mothers, as do females, an observation that, surprisingly, plays no role at all in the paper.

What is the purpose of the strange idea of splitting into two subpopulations and then ignoring the fact that the distributions may evolve (and why just “may” — surely “will” would be more appropriate)? Perhaps the idea is that a typical gene (or combination of genes) gives rise not to qualities such as strength or intelligence, but to more obscure features that express themselves unpredictably — they don’t necessarily make you stronger, for instance, but they give you a bigger range of strength possibilities. But is there the slightest evidence for such a hypothesis? If not, then why not just consider the population as a whole? My guess is that you just don’t get the desired conclusion if you do that.

I admit that I have not spent as long thinking about the paper as I would need to in order to be 100% confident of my criticisms. I am also far from expert in evolutionary biology and may therefore have committed some rookie errors in what I have written above. So I’m prepared to change my mind if somebody (perhaps the author?) can explain why the criticisms are invalid. But as it looks to me at the time of writing, the paper isn’t a convincing model, and even if one accepts the model, the conclusion drawn from the main theorem is not properly established. Apparently the paper had a very positive referee’s report. The only explanation I can think of for that is that it was written by somebody who worked in evolutionary biology, didn’t really understand mathematics, and was simply pleased to have what looked like a rigorous mathematical backing for their theories. But that is pure speculation on my part and could be wrong.

I said earlier that I don’t think one should be so afraid of the genetic variability hypothesis that one feels obliged to dismiss all the literature that claims to have observed greater variability amongst males. For all I know it is seriously flawed, but I don’t want to have to rely on that in order to cling desperately to my liberal values.

So let’s just suppose that it really is the case that amongst a large number of important traits, males and females have similar averages but males appear more at the extremes of the distribution. Would that help to explain the fact that, for example, the proportion of women decreases as one moves up the university hierarchy in mathematics, as Larry Summers once caused huge controversy by suggesting? (It’s worth looking him up on Wikipedia to read his exact words, which are more tentative than I had realized.)

The theory might appear to fit the facts quite well: if men and women are both normally distributed with the same mean but men have a greater variance than women, then a randomly selected individual from the top percent of the population will be more and more likely to be male the smaller gets. That’s just simple mathematics.

But it is nothing like enough reason to declare the theory correct. For one thing, it is just as easy to come up with an environmental theory that would make a similar prediction. Let us suppose that the way society is organized makes it harder for women to become successful mathematicians than for men. There are all sorts of reasons to believe that this is the case: relative lack of role models, an expectation that mathematics is a masculine pursuit, more disruption from family life (on average), distressing behaviour by certain male colleagues, and so on. Let’s suppose that the result of all these factors is that the distribution of whatever it takes for women to make a success of mathematics has a slightly lower mean than that for men, but roughly the same variance, with both distributions normal. Then again one finds by very basic mathematics that if one picks a random individual from the top percent, that individual will be more and more likely to be male as gets smaller. But in this case, instead of throwing up our hands and saying that we can’t fight against biology, we will say that we should do everything we can to compensate for and eventually get rid of the disadvantages experienced by women.

A second reason to be sceptical of the theory is that it depends on the idea that how good one is at mathematics is a question of raw brainpower. But that is a damaging myth that puts many people off doing mathematics who could have enjoyed it and thrived at it. I have often come across students who astound me with their ability to solve problems far more quickly than I can, (not all of them male). Some of them go on to be extremely successful mathematicians, but not all. And some who seem quite ordinary go on to do extraordinary things later on. It is clear that while an unusual level of raw brainpower, whatever that might be, often helps, it is far from necessary and far from sufficient for becoming a successful mathematician: it is part of a mix that includes dedication, hard work, enthusiasm, and often a big slice of luck. And as one gains in experience, one gains in brainpower — not raw any more, but who cares whether it is hardware or software? So *even if* it turned out that the genetic variability hypothesis was correct and could be applied to something called raw mathematical brainpower, a conclusion that would be very hard to establish convincingly (it’s certainly not enough to point out that males find it easier to visualize rotating 3D objects in their heads), that *still* wouldn’t imply that it is pointless to try to correct the underrepresentation of women amongst the higher ranks of mathematicians. When I was a child, almost all doctors and lawyers were men, and during my lifetime I have seen that change completely. The gender imbalance amongst mathematicians has changed more slowly, but there is no reason in principle that the pace couldn’t pick up substantially. I hope to live to see that happen.

### Polymath15, tenth thread: numerics update

This is the tenth “research” thread of the Polymath15 project to upper bound the de Bruijn-Newman constant , continuing this post. Discussion of the project of a non-research nature can continue for now in the existing proposal thread. Progress will be summarised at this Polymath wiki page.

Most of the progress since the last thread has been on the numerical side, in which the various techniques to numerically establish zero-free regions to the equation have been streamlined, made faster, and extended to larger heights than were previously possible. The best bound for now depends on the height to which one is willing to assume the Riemann hypothesis. Using the conservative verification up to height (slightly larger than) , which has been confirmed by independent work of Platt et al. and Gourdon-Demichel, the best bound remains at . Using the verification up to height claimed by Gourdon-Demichel, this improves slightly to , and if one assumes the Riemann hypothesis up to height the bound improves to , contingent on a numerical computation that is still underway. (See the table below the fold for more data of this form.) This is broadly consistent with the expectation that the bound on should be inversely proportional to the logarithm of the height at which the Riemann hypothesis is verified.

As progress seems to have stabilised, it may be time to transition to the writing phase of the Polymath15 project. (There are still some interesting research questions to pursue, such as numerically investigating the zeroes of for negative values of , but the writeup does not necessarily have to contain every single direction pursued in the project. If enough additional interesting findings are unearthed then one could always consider writing a second paper, for instance.

Below the fold is the detailed progress report on the numerics by Rudolph Dwars and Kalpesh Muchhal.

** — Quick recap — **

The effectively bounded and normalised, Riemann-Siegel type asymptotic approximation for :

enables us to explore its complex zeros and to establish zero-free regions. By choosing a promising combination and , and then numerically and analytically showing that the right-hand side doesn’t vanish in the rectangular shaped “canopy” (or a point on the blue hyperbola), a new DBN upper bound will be established. Summarized in this visual:

**— The Barrier approach —**

To verify that in such a rectangular strip, we have adopted the so-called Barrier-approach that comprises of three stages (illustrated in a picture below):

- Use the numerical verification work of the RH already done by others. Independent teams have now verified the RH up to , and a single study took it up to . This work allows us to rule out, up to a certain , that a complex zero has flown through the critical strip into any defined canopy. To also cover the x-domains that lie beyond these known verifications, we have to assume the RH up to . This will then yield a that is conditional on this assumption.
- Complex zeros could also have horizontally flown into the ‘forbidden tunnel’ at high velocity. To numerically verify this hasn’t occurred, a Barrier needs to be introduced at and checked for any zeros having flown around, through or over it.
- Verifying the range (or ) is done through testing that the lower bound of always stays higher than the upper bound of the error terms. This has to be done numerically up to a certain point , after which analytical proof takes over.

So, new numerical computations are required to verify that both the Barrier at and the non-analytical part of the range are zero-free for a certain choice of .

** — Verifying the Barrier is zero-free —**

So, how to numerically verify that the Barrier is zero-free?

- The Barrier is required to have two nearby screens at and to ensure that no complex zeros could fly around it. Hence, it has the 3D structure: .
- For the numerical verification that the Barrier is zero-free, it is treated as a ‘pile’ of rectangles. For each rectangle the winding number is computed using the argument principle and Rouché’s theorem.
- For each rectangle, the number of mesh points required is decided using the -derivative, and the t-step is decided using the -derivative.

__Optimizations used for the barrier computations__

- To efficiently calculate all required mesh points of on the rectangle sides, we used a pre-calculated stored sum matrix that is Taylor expanded in the and -directions. The resulting polynomial is used to calculate the required mesh points. The formula for the stored sum matrix:

with and , where and are the number of Taylor expansion terms required to achieve the required level of accuracy (in our computations we used 20 digits and an algorithm to automatically determine and ).

- We found that a more careful placement of the Barrier at an makes a significant difference in the computation time required. A good location is where has a large relative magnitude. Since retains some Euler product structure, such locations can be quickly guessed by evaluating a certain euler product upto a small number of primes, for multiple X candidates in an X range.
- Since and have smooth i.e. non-oscillatory behavior, using conservative numeric integrals with the Lemma 9.3 summands, , instead of the actual summation is feasible, and is significantly faster (the time complexity of estimation becomes independent of )
- Using a fixed mesh for a rectangle contour (can change from rectangle to rectangle) allows for vectorized computations and is significantly faster than using an adaptive mesh. To determine the number of mesh points, it is assumed that will stay above 1 (which is expected given the way the X location has been chosen, and is later verified after has been computed at all the mesh points). The number is chosen as
- Since for the above fixed mesh generally comes way above 1, the lower bound along the entire contour (not just on the mesh points) is higher than what would be the case with an adaptive mesh. This property is used to obtain a larger t-step while moving in the t-direction

** — Verifying the range —**

This leaves us with ensuring the range (where is the value of corresponding to the barrier ) is zero-free through checking that for each , the lower bound always exceeds the upper bound of the error terms.

- From theory, two lower bounds are available: the Lemma-bound (eq. 80 in the writeup) and an approximate Triangle bound (eq. 79 in the writeup). Both bounds can be ‘mollified’ by choosing an increasing number of primes (to a certain extent) until the bound is sufficiently positive.
- The Lemma bound is used to find the number of ‘mollifiers’ required to make the bound positive at . We found that using primes was the max. number of primes still allowing an acceptable computational performance.
- The approximate Triangle bound evaluates faster and is used to establish the mollified (either 0 primes or only prime 2) end point before the analytical lower bound takes over.
- The Lemma-bound is then also used to calculate that for each in , the lower bound stays sufficiently above the error terms. The Lemma bound only needs to be verified for the line segment , since the Lemma bound monotonically increases when goes to 1.

__Optimizations used for Lemmabound calculations__

- To speed up computations a fast “sawtooth” mechanism has been developed. This only calculates the minimally required incremental Lemma bound terms and only induces a full calculation when the incremental bound goes below a defined threshold (that is sufficiently above the error bounds).

where

(as presented within section 9 of the writeup, pg. 42)

** — Software used —**

To accommodate the above, he following software has been developed in both **pari/gp** (__https://pari.math.u-bordeaux.fr__) and **ARB** (__http://arblib.org__):

For verifying the Barrier:

to find the optimal location to place the Barrier.__Barrier_Location_Optimizer__to generate in matrix form, the coefficients of the Taylor polynomial. This is one-off activity for a given , post which the coefficients can be used for winding number computations in different and ranges.__Stored_Sums_Generator__to verify that no complex zeros passed the Barrier.__Winding_Number_Calculator__

For verifying the range:

__N____b__for the number of mollifiers to make the bound positive.___Location_Finder____Lemmabound_calculator__to verify each incrementally calculated Lemma bound stays above the error bounds. Generally this script and the Lemmabound calculator script are substitutes for each other, although the latter may also be used for some initial portion of the N range.__LemmaBound_Sawtooth_calculator__

Furthermore we have developed software to compute:

- as and/or .
- the exact value (using the bounded version of the 3rd integral approach).

The software supports parallel processing through multi-threading and grid computing.

** — Results achieved —**

For various combinations of , these are the numerical outcomes:

The numbers suggest that we now have numerically verified that (even at two different Barrier locations). Also, conditionally on the RH being verified up to various , we have now reached a . We are cautiously optimistic, that the tools available right now, do even bring a conditional within reach of computation.

** — Timings for verifying DBN —**

**Procedure**

**Timings**

Winding number check in the barrier for t=[0,0.2], y=[0.2,1]

42 sec

Lemma bounds using incremental method for N=[69098, 250000] and a 4-prime mollifier {2,3,5,7}

118 sec

Overall Timings

~200 sec

Remarks:

*Timings to be multiplied by a factor of ~3.2 for each incremental order of magnitude of x.**Parallel processing significantly improves speed (e.g. Stored sums was done in < 7 sec).**Mollifier 2 analytical bound at is .*

** — Links to computational results and software used: —**

*Numerical results achieved:*

- Stored sums https://github.com/km-git-acc/dbn_upper_bound/tree/master/output/storedsums
- Winding numbers https://github.com/km-git-acc/dbn_upper_bound/tree/master/output/windingnumbers

- Lemmabound N_a…N_b https://github.com/km-git-acc/dbn_upper_bound/tree/master/output/eulerbounds

*Software scripts used:*

### Benoist’s minicourse “Arithmeticity of discrete groups”. Lecture II: Closedness of the L-orbits

As it was announced in the end of the first post of this series, we will discuss today the first half of the proof of the following result:

**Theorem 1** *Let , , and . Suppose that is a discrete and Zariski dense subgroup of such that is cocompact. Then, is commensurable to some -form of .*

**Remark 1** *This statement is originally due to Hee Oh, but the proof below is a particular case of Benoist–Miquel’s arguments. In particular, our subsequent discussions can be generalized to obtain the statement of Theorem 1 of the previous post in full generality.*

**Remark 2** *Theorem 1 is not true without the higher rank assumption (i.e., ): indeed, has infinite index in .*

Our task is to construct a -form satisfying the conclusions of Theorem 1. This is not very easy because it must cover all possible cases of -forms such as:

**Example 1**

- ;
- where are integers in a division algebra over ;
- “”, i.e., where is Galois conjugation (and is the transpose of ).

Before trying to construct adequate -forms, let us make some preliminary reductions.

We denote by the *parabolic subgroup* normalizing of in : more concretely,

where .

Next, we consider . In the literature, is called an opposite horospherical subgroup to .

Since is Zariski dense in , there exists such that (i.e., ). By taking a basis of such that , we have that

In particular, is cocompact in (thanks to the assumptions of Theorem 1).

**Remark 3** *This is the one of the few places in Benoist–Miquel argument where the Zariski denseness of is used.*

**Remark 4** *In general, the argument above works when is reflexive, that is, is conjugated to an opposite horospherical subgroup .*

We denote by an opposite parabolic subgroup, and

the common Levi subgroup of and . In particular, we have decompositions (in semi-direct products)

Let and be the Lie algebras of and . Note that and (resp.) are lattices in and (resp.). In other terms,

where is the space of lattices in .

**Remark 5** *Note that for all in the context of the example and .*

Observe that is the intersection of the normalizers of and . Therefore, acts on the spaces of lattices and via the adjoint map (i.e., by conjugation).

As it turns out, the key step towards the proof of Theorem 1 consists in showing that the -orbits of and are closed. In other terms, the proof of Theorem 1 can be divided into two parts:

- closedness of the -orbits of and ;
- construction of the -form based on the closedness of the -orbits above.

In the remainder of this post, we shall establish the closedness of relevant -orbits. Then, the next post of this series will be dedicated to obtain an adequate -form (i.e., arithmeticity) from this closedness property.

**Remark 6** *Hee Oh’s original argument used Ratner’s theory for the semi-simple part of to derive the desired closedness property. The drawback of this strategy is the fact that it doesn’t allow to treat some cases (such as ), and, for this reason, Benoist and Miquel are forced to proceed along the lines below.*

**1. Closedness of the -orbit of **

Consider Bruhat’s decomposition (where is the Lie algebra of ) and the corresponding projection .

Given , set , i.e.,

and consider the Zariski open set

Our first step towards the closedness of the -orbit of is to exploit the discreteness of and the commutativity of in order to get that the actions of the matrices , , on the vectors of the lattice do not produce arbitrarily short vectors:

**Proposition 2** *The set is closed and discrete in .*

*Proof:* Let with , , , and such that .

Our task is to show that for all sufficiently large.

For this sake, note that a direct calculation reveals that for all and . In particular, . Now, we use the cocompactness of to write with and (modulo taking subsequences).

By definition, . Since is *commutative*, and hence

as . Because is discrete, it follows that for all sufficiently large. Therefore, for all sufficiently large. This completes the proof of the proposition.

**Remark 7** *The fact that is commutative plays a key role in the proof of this proposition.*

Next, we shall combine this proposition with Mahler’s compactness criterion to study the set of determinants of the matrices for .

**Proposition 3** *Let for . Then, the set is closed and discrete in .*

*Proof:* Given such that , we want to show that for all sufficiently large.

By contradiction, let us assume that this is not the case. In particular, there is a subsequence with , i.e., , and also for all . Note that, by definition, is the covolume of .

By Proposition 2, doesn’t have small vectors. Since these lattices also have bounded covolumes (because ), we can invoke Mahler’s compactness criterion to extract a subsequence such that . In this setting, Proposition 2says that we must have for all sufficiently large, so that for all sufficiently large, a contradiction.

Now, we shall modify to obtain a polynomial function on . For this sake, we recall the element introduced in (1) (conjugating and ). In this context,

is a polynomial function of . Moreover, an immediate consequence of Proposition 3 is:

**Corollary 4** * is a closed and discrete subset of .*

The polynomial is relevant to our discussion because it is intimately connected to the action of :

- on one hand, a straighforward calculation reveals that is proportional to for all (i.e., for some [explicit] ): see Lemma 3.12 of Benoist–Miquel paper;
- on the other hand, some purely algebraic considerations show that is the virtual stabilizer of the proportionality class of , i.e., is a finite-index subgroup of : see Proposition 3.13 of Benoist–Miquel paper.

At this stage, we are ready to establish the closedness of the -orbit of :

**Theorem 5** *The -orbit of is closed in .*

*Proof:* Given such that , we write with converging to the identity element .

Our task is reduce to show that for all sufficiently large. For this sake, it suffices to find stabilizing the proportionality class of the polynomial (thanks to Remark 8).

In this direction, we take . By definition, . Also, by Remark 8, we know that for some constant depending on the covolume of . In particular, .

As it turns out, since the lattices converge to , one can check that the quantities converge to some related to the covolume of . Moreover, because . Hence, we can apply Corollary 4 to deduce that and

for all sufficiently large depending on , say .

At this point, we observe that the degrees of the polynomials are *uniformly bounded* and is Zariski-dense in . Thus, we can choose such that

for all and .

In other terms, stabilizes the proportionality class of for all . It follows from Remark 8 that for all sufficiently large. This completes the argument.

**2. Closedness of the -orbit of **

The proof of the fact that the -orbit of is closed in follows the same ideas from the previous section: one introduces the polynomial for , one shows that is closed and discrete in , and one exploits this information to get the desired conclusion.

In particular, our discussion of the first half of the proof of Theorem 1 is complete. Next time, we will see how this information can be used to derive the arithmeticity of . We end this post with the following remark:

**Remark 9** *Roughly speaking, we covered Section 3 of Benoist–Miquel article (and the reader is invited to consult it for more details about all results mentioned above).**Finally, a closer inspection of the arguments shows that the statements are true in greater generality provided is reflexive and commutative (cf. Remarks 4 and 7).*

### 254A, Notes 0: Physical derivation of the incompressible Euler and Navier-Stokes equations

This coming fall quarter, I am teaching a class on topics in the mathematical theory of incompressible fluid equations, focusing particularly on the incompressible Euler and Navier-Stokes equations. These two equations are by no means the only equations used to model fluids, but I will focus on these two equations in this course to narrow the focus down to something manageable. I have not fully decided on the choice of topics to cover in this course, but I would probably begin with some core topics such as local well-posedness theory and blowup criteria, conservation laws, and construction of weak solutions, then move on to some topics such as boundary layers and the Prandtl equations, the Euler-Poincare-Arnold interpretation of the Euler equations as an infinite dimensional geodesic flow, and some discussion of the Onsager conjecture. I will probably also continue to more advanced and recent topics in the winter quarter.

In this initial set of notes, we begin by reviewing the physical derivation of the Euler and Navier-Stokes equations from the first principles of Newtonian mechanics, and specifically from Newton’s famous three laws of motion. Strictly speaking, this derivation is not needed for the mathematical analysis of these equations, which can be viewed if one wishes as an arbitrarily chosen system of partial differential equations without any physical motivation; however, I feel that the derivation sheds some insight and intuition on these equations, and is also worth knowing on purely intellectual grounds regardless of its mathematical consequences. I also find it instructive to actually see the journey from Newton’s law

to the seemingly rather different-looking law

for incompressible Navier-Stokes (or, if one drops the viscosity term , the Euler equations).

Our discussion in this set of notes is physical rather than mathematical, and so we will not be working at mathematical levels of rigour and precision. In particular we will be fairly casual about interchanging summations, limits, and integrals, we will manipulate approximate identities as if they were exact identities (e.g., by differentiating both sides of the approximate identity), and we will not attempt to verify any regularity or convergence hypotheses in the expressions being manipulated. (The same holds for the exercises in this text, which also do not need to be justified at mathematical levels of rigour.) Of course, once we resume the mathematical portion of this course in subsequent notes, such issues will be an important focus of careful attention. This is a basic division of labour in mathematical modeling: non-rigorous heuristic reasoning is used to derive a mathematical model from physical (or other “real-life”) principles, but once a precise model is obtained, the analysis of that model should be completely rigorous if at all possible (even if this requires applying the model to regimes which do not correspond to the original physical motivation of that model). See the discussion by John Ball quoted at the end of these slides of Gero Friesecke for an expansion of these points.

Note: our treatment here will differ slightly from that presented in many fluid mechanics texts, in that it will emphasise first-principles derivations from many-particle systems, rather than relying on bulk laws of physics, such as the laws of thermodynamics, which we will not cover here. (However, the derivations from bulk laws tend to be more robust, in that they are not as reliant on assumptions about the particular interactions between particles. In particular, the physical hypotheses we assume in this post are probably quite a bit stronger than the minimal assumptions needed to justify the Euler or Navier-Stokes equations, which can hold even in situations in which one or more of the hypotheses assumed here break down.)

** — 1. From Newton’s laws to the Euler and Navier-Stokes equations — **

For obvious reasons, the derivation of the equations of fluid mechanics is customarily presented in the three dimensional setting (and sometimes also in the two-dimensional setting ), but actually the general dimensional case is not that much more difficult (and in some ways clearer, as it reveals that the derivation does not depend on any structures specific to three dimensions, such as the cross product), so for this derivation we will work in the spatial domain for arbitrary . One could also work with bounded domains , or periodic domains such as ; the derivation is basically the same, thanks to the local nature of the forces of fluid mechanics, except at the boundary where the situation is more subtle (and may be discussed in more detail in later posts). For sake of notational simplicity, we will assume that the time variable ranges over the entire real line ; again, since the laws of classical mechanics are local in time, one could just as well restrict to some sub-interval of this line, such as for some time .

Our starting point is Newton’s second law , which (partially) describes the motion of a particle of some fixed mass moving in the spatial domain . (Here we assume that the mass of a particle does not vary with time; in particular, our discussion will be purely non-relativistic in nature, though it is possible to derive a relativistic version of the Euler equations by variants of the arguments given here.) We write Newton’s second law as the ordinary differential equation

where is the trajectory of the particle (thus denotes the position of the particle at time ), and is the force applied to that particle. If we write for the coordinates of the vector-valued function , and similarly write for the components of , we therefore have

where we adopt in this section the convention that the indices are always understood to range from to .

If one has some collection particles instead of a single particle, indexed by some set of labels (e.g. the numbers from to , if there are a finite number of particles; for unbounded domains such as one can also imagine situations in which is infinite), then for each , the particle has some mass , some trajectory (with components ), and some force applied (with components , we thus have the equation of motion

or in components

In this section we adopt the convention that the indices are always understood to range over the set of labels for the particles; in particular, their role should not be confused with those of the coordinate indices .

Newton’s second law does not, by itself, completely specify the evolution of a system of particles, because it does not specify exactly how the forces depend on the current state of the system. For particles, the current state at a given time is given by the positions of all the particles , as well as their velocities ; we assume for simplicity that the particles have no further physical characteristics or internal structure of relevance that would require more state variables than these. (No higher derivatives need to be specified beyond the first, thanks to Newton’s second law. On the other hand, specifying position alone is insufficient to describe the state of the system; this was noticed as far back as Zeno in his paradox of the arrow, which in retrospect can be viewed as a precursor to Newton’s second law insofar as it demonstrated that the laws of motion needed to be second-order in time (in contrast, for instance, to Aristotlean physics, which was morally first-order in nature).) At a fundamental level, the dependency of forces on the current state are governed by the laws of physics for such forces; for instance, if the particles interact primarily through electrostatic forces, then one needs the laws of electrostatics to describe these forces. (In some cases, such as electromagnetic interactions, one cannot accurately model the situation purely in terms of interacting particles, and the equations of motion will then involve some additional mediating fields such as the electromagnetic field; but we will ignore this possibility in the current discussion for sake of simplicity.)

Fortunately, thanks to other laws of physics, and in particular Newton’s other two laws of motion, one can still obtain partial information about the forces without having to analyse the fundamental laws producing these forces. For instance, Newton’s first law of motion (when combined with the second) tells us that a single particle does not exert any force on itself; the net force on only arises from interaction with other particles , (for this discussion we neglect external forces, such as gravity, although one could easily incorporate such forces into this discussion; see Exercise 3 below). We will assume that the only forces present are pair interactions coming from individual pairs of particles ; it is theoretically possible that one could have more complicated interactions between, say, a triplet of particles that do not simply arise from the interactions between the three pairs , , , but we will not consider this possibility here. We also assume that the net force on a particle is just the sum of all the interacting forces (i.e., the force addition law contains no nonlinear terms). This gives us a decomposition

of the net force on a particle into the interaction force exerted on by another particle . Thus the equation of motion is now

Of course, this description is still incomplete, because we have not specified exactly what the interaction forces are. But one important constraint on these forces is provided by Newton’s third law

This already gives some restrictions on the possible dynamics. For instance, it implies (formally, at least) that the total momentum

(which takes values in ) is conserved in time:

We will also assume that the interaction force between a pair of particles is parallel to the displacement between the pair; in other words, we assume the torque created by this force vanishes, thus

Here is the exterior product on (which in three dimensions can be transformed if one wishes to the cross product, but is well defined in all dimensions). Algebraically, is the universal alternating bilinear form on ; in terms of the standard basis of , the wedge product of two vectors , is given by

and the vector space is the formal span of the basic wedge products for .

One consequence of the absence (4) of torque is the conservation of total angular momentum

(around the spatial origin ). Indeed, we may calculate

Note that there is nothing special about the spatial origin ; the angular momentum

around any other point is also conserved in time, as is clear from repeating the above calculation, or by combining the existing conservation laws for (3) and (5).

Now we pass from particle mechanics to continuum mechanics, by considering the limiting (bulk) behaviour of many particle systems as the number of particles per unit volume goes to infinity. (In physically realistic scenarios, will be comparable to Avagadro’s constant, which seems large enough that such limits should be a good approximation to the truth.) To make such limits, we assume that the distribution of particles (and various properties of these particles, such as their velocities and net interaction forces) are approximated in a certain bulk sense by continuous fields. For instance, the mass distribution of a system of particles at a given time is given by the discrete measure

where denotes the Dirac measure (or distribution). We will assume that at each time , we have a “bulk” approximation

by some continuous measure , where the density function is some smooth function of time and space, and denotes Lebesgue measure on . What does bulk approximation mean? One could work with various notions of approximation, but we will adopt the viewpoint of the theory of distributions (as reviewed for instance in these old lecture notes of mine) and consider approximation against test functions in spacetime, thus we assume that

for all spacetime test functions . (One could also work with purely spatial test functions at each fixed time , or work with “infinitesimal” paralleopipeds or similar domains instead of using test functions; the arguments look slightly different when doing so, but the final equations of motion obtained are the same in all cases. See Exercise 1 for an example of this.) We will be deliberately vague as to what means, other than to say that the approximation should only be considered accurate (in the sense that it becomes exact in the limit ) when the test function exists at “macroscopic” (or “bulk”) spatial scales, in particular it should not oscillate with a wavelength that goes to zero as goes to infinity. (For instance, one certainly expects the approximation (7) to break down if one tries to test it on scales comparable to the mean spacing between particles.)

Applying (6) and evaluating the delta integrations, the approximation (8) becomes

In a physical liquid, particles in a given small region of space tend to move at nearly identical velocities (as opposed to gases, where Brownian motion effects lead one to expect velocities to be distributed stochastically, for instance in a Maxwellian distribution). To model this, we assume that there exists a smooth *velocity field* for which we have the approximation

for all particles and all times . (When stochastic effects are significant, the continuum limit of the fluid will be the Boltzmann equations rather than the Euler or Navier-Stokes equations; however the latter equations can still emerge as an approximation of the former in various regimes. See also Remark 7 below.)

Implicit in our model of many-particle interactions is the conservation of mass: each particle has a fixed mass , and no particle is created or destroyed by the evolution. This conservation of mass, when combined with the approximations (9) and (10), gives rise to a certain differential equation relating the density function and the velocity field . To see this, first observe from the fundamental theorem of calculus in time that

or equivalently (after applying (6) and evaluating the delta integrations)

for any test function (note that we make compactly supported in both space and time). By the chain rule and (10), we have

for any particle and any time , where denotes the partial derivative with respect to the time variable, denotes the spatial gradient (with denoting partial differentiation in the coordinate), and denotes the Euclidean inner product. (One could also use notation here that avoids explicit use of Euclidean structure, for instance writing in place of , but it is traditional to use Euclidean notation in fluid mechanics.) This particular combination of derivatives appears so often in the subject that we will give it a special name, the material derivative :

We have thus obtained the approximation

which on insertion back into (12) yields

The material derivative of a test function will still be a test function . Therefore we can use our field approximation (9) to conclude that

for all test functions . The left-hand side consists entirely of the limiting fields, so on taking limits we should therefore have the exact equation

We can integrate by parts to obtain

where is the adjoint material derivative

Since the test function was arbitrary, we conclude the continuity equation

or equivalently (and more customarily)

We will eventually specialise to the case of incompressible fluids in which the density is a non-zero constant in both space and time. (In some texts, one uses incompressibility to refer only to constancy of along trajectories: . But in this course we always use incompressibility to refer to *homogeneous incompressibility*, in which is constant in both space and time.) In this incompressible case, the continuity equation (15) simplifies to a divergence-free condition on the velocity:

For now, though, we allow for the possibility of compressibility by allowing to vary in space and time. We also note that by integrating (15) in space, we formally obtain conservation of the total mass

since on differentiating under the integral sign and then integrating by parts we formally have

Of course, this conservation law degenerates in the incompressible case, since the total mass is manifestly an infinite constant. (In periodic settings, for instance if one is working in instead of , the total mass is manifestly a finite constant in the incompressible case.)

- (i) Assume that the spacetime bulk approximation (8) is replaced by the spatial bulk approximation
for any time and any spatial test function . Give an alternate heuristic derivation of the continuity equation (15) in this case, without using any integration in time. (Feel free to differentiate under the integral sign.)

- (ii) Assume that the spacetime bulk approximation (8) is replaced by the spatial bulk approximation
for any time and any “reasonable” set (e.g., a rectangular box). Give an alternate heuristic derivation of the continuity equation (15) in this case. (Feel free to introduce infinitesimals and argue non-rigorously with them.)

We can repeat the above analysis with the mass distribution (6) replaced by the momentum distribution

thus we now wish to exploit the conservation of momentum rather than conservation of mass. The measure is a vector-valued measure, or equivalently a vector of scalar measures

Instead of starting with the identity (11), we begin with the momentum counterpart

which on applying (17) and evaluating the delta integrations becomes

Using the product rule and (13), the left-hand side is approximately

Applying (1) and (10), we conclude

We can evaluate the second term using (9) to obtain

What about the first term? We can use symmetry and Newton’s third law (2) to write

Now we make the further physical assumption that the only significant interactions between particles are *short-range interactions*, in which and are very close to each other. With this hypothesis, it is then plausible to make the Taylor approximation

We thus have

We write this in coordinates as

for , where we use the Einstein convention that indices are implicitly summed over if they are repeated in an expression, and the stress on the particle at time is the rank -tensor defined by the formula

where denote the components of . Recall from the torque-free hypothesis (4) that is parallel to , thus we could write for some scalar . Thus we have

In particular, we see that the torque-free hypothesis makes the stress tensor symmetric:

To proceed further, we make the assumption (similar to (9)) that the stress tensor (or more precisely, the measure ) is approximated in the bulk by a smooth tensor field (with components for ), in the sense that

The tensor is known as the Cauchy stress tensor. Since is symmetric in , the right-hand side of (21) is also symmetric in , which by the arbitrariness of implies that the tensor is symmetric also:

This is also known as Cauchy’s second law of motion.

Inserting (19) into (21), we arrive at

for any test function and . Taking limits as we obtain the exact equation

and then integrating by parts we have

as the test function is arbitrary, we conclude the Cauchy momentum equation

From the Leibniz rule (in an adjoint form) we see that

using (14) we can thus also write the Cauchy momentum equation in the more conventional form

(This is a dynamical version of Cauchy’s first law of motion.)

To summarise so far, the unknown density field and velocity field obey two equations of motion: the continuity equation (15) (or (14)) and the momentum equation (22). As the former is a scalar equation and the latter is a vector equation, this is equations for unknowns, which looks good – so long as the stress tensor is known. However, the stress tensor is not given to us in advance, and so further physical assumptions on the underlying fluid are needed to derive additional equations to yield a more complete set of equations of motion.

One of the simplest such assumptions is *isotropy* – that, in the vicinity of a given particle at a given point in time, the distribution of the nearby particles (and of the forces ) is effectively rotationally symmetric, in the sense that rotation of the fluid around that particle does not significantly affect the net stresses acting on the particle. To give more mathematical meaning to this assumption, let us fix and , and let us set to be the spatial origin for simplicity. In particular the stress tensor now simplifies a little to

viewing this tensor as a symmetric matrix, we can also write

where we now think of the vector as a -dimensional column vector, and denotes the transpose of .

Imagine that we rotate the fluid around this spatial origin using some rotation matrix , thus replacing with (and hence is replaced with ). If we assume that interaction forces are rotationally symmetric, the interaction scalars should not be affected by this rotation. As such, would be replaced with . If we assume isotropy, though, this rotated fluid should generate the same stress as the original fluid, thus we have

for all rotation matrices , that is to say that commutes with all rotations. This implies that all eigenspaces of are rotation-invariant, but the only rotation-invariant subspaces of are and . Thus the spectral decomposition of the symmetric matrix only involves a single eigenspace , or equivalently is a multiple of the identity. In coordinates, we have

for some scalar (known as the pressure exerted on the particle ), where denotes the Kronecker delta (the negative sign here is for compatibility with other physical definitions of pressure). Passing from the individual stresses to the stress field , we see that is also rotationally invariant and thus is also a multiple of the identity, thus

for some field , which we call the *pressure* field, and which we assume to be smooth. The equations (15), (22) now become the Euler equations

This is still an underdetermined system, being equations for unknowns (two scalar fields and one vector field ). But if we assume incompressibility (normalising for simplicity), we obtain the incompressible Euler equations

Without incompressibility, one can still reach a determined system of equations if one postulates a relationship (known as an equation of state) between the pressure and the density (in some physical situations one also needs to introduce further thermodynamic variables, such as temperature or entropy, which also influence this relationship and obey their own equation of motion). Alternatively, one can proceed using an analysis of the energy conservation law, similar to how (15) arose from conservation of mass and (22) arose from conservation of momentum, though at the end of the day one would still need an equation of state connecting energy density to other thermodynamic variables. For details, see Exercise 4 below.

Now we consider relaxing the assumption of isotropy in the stress. Many fluids, in addition to experiencing isotropic stress coming from a scalar pressure field, also experience additional shear stress associated to strain in the fluid – distortion in the shape of the fluid arising from fluctuations in the velocity field. One can thus postulate a generalisation to (23) of the form

where is some function of the velocity at the point in spacetime, as well as the first derivatives , that arises from changes in shape. (Here, we assume here that the response to stress is autonomous – it does not depend directly on time, location, or other statistics of the fluid, such as pressure, except insofar as those variables are related to the quantities and . One can of course consider more complex models in which there is a dependence on such quantities or on higher derivatives , etc. of the velocity, but we will not do so here.)

It is a general principle in physics that functional relationships between physical quantities, such as the one in (27), can be very heavily constrained by requiring the relationship to be invariant with respect to various physically natural symmetries. This is certainly the case for (27). First of all, we can impose Galilean invariance: if one changes to a different inertial frame of reference, thus adding a constant vector to the velocity field (and not affecting the gradient at all), this should not actually introduce any new stresses on the fluid. This leads to the postulate

for any , and hence should not actually depend on the velocity and should only depend on the first derivative. Thus we now write for (thus is now a function from to ).

If then there should be no additional shear stress, so we should have . We now make a key assumption that the fluid is a Newtonian fluid, in that the linear term in the Taylor expansion of dominates, or in other words we assume that is a linear function of . (One can certainly study the mechanics of non-Newtonian fluids as well, in which depends nonlinearly on , or even on past values of , but these are no longer governed by the Navier-Stokes equations and will not be considered further here.) One can also think of as a linear map from the space of matrices (which is the space where takes values) to the space of matrices. In coefficients, this means we are postulating a relationship of the form

for some constants (recall we are using the Einstein summation conventions). This looks like a lot of unspecified constants, but again we can use physical principles to impose significant constraints. Firstly, because stress is symmetric in and , the coefficients must also be symmetric in and : . Next, let us for simplicity set to be the spatial origin , and consider a rotating velocity field of the form

for some constant-coefficient anti-symmetric matrix , or in coordinates

The derivative field is then just the anti-symmetric . This corresponds to fluids moving according to the rotating trajectories , where denotes the matrix exponential. (For instance, in two dimensions, the velocity field gives rise to trajectories , corresponding to counter-clockwise rotation around the origin.)

**Exercise 2** When is anti-symmetric, show that the matrix is orthogonal for all , thus

for all , where denotes the identity matrix. (*Hint:* differentiate the expressions appearing in the above equation with respect to time.) Also show that is a rotation matrix, that is to say an orthogonal matrix of determinant .

As is orthogonal, it describes a rigid motion. Thus this rotating motion does not change the shape of the fluid, and so should not give rise to any shear stress. That is to say, the linear map should vanish when applied to an anti-symmetric matrix, or in coordinates . Another way of saying this is that only depends on through its symmetric component , known as the rate-of-strain tensor (or the *deformation tensor*). (The anti-symmetric part does not cause any strain, but instead measures the infinitesimal rotation the fluid; up to trivial factors such as , it is essentially the vorticity of the fluid, which will be an important field to study in subsequent notes.)

To constrain the behaviour of further, we introduce a hypothesis of rotational (and reflectional) symmetry. If one rotates the fluid by a rotation matrix around the origin, then if the original fluid had a velocity of at , the new fluid should have a velocity of at , thus the new velocity field is given by the formula

and the derivative of this velocity field at the origin is then related to the original derivative by the formula

Meanwhile, as discussed in the analysis of the isotropic case, the new stress at the origin is related to the original stress by the same relation:

This means that the linear map is rotationally equivariant in the sense that

for any matrix . Actually the same argument also applies for reflections, so one could also take in the orthogonal group rather than the special orthogonal group.

This severely constrains the possible behaviour of . First consider applying to the rank matrix , where is the first basis (column) vector of . The equivariant property (28) then implies that is invariant with respect to any rotation or reflection of the remaining coordinates . As in the isotropic analysis, this implies that the lower right minor of is a multiple of the identity; when it also implies that the upper right entries or lower left entries for also vanish (one can also obtain this by applying (28) with the reflection in the variable). Thus we have

for some constant scalars (known as the *dynamic viscosity* and *second viscosity* respectively); in matrix form we have

where is the identity matrix. Applying equivariance, we conclude that

for any unit vector ; applying the spectral theorem this implies that

for any symmetric matrix . Since is already known to vanish for non-symmetric matrices, upon decomposing a general matrix into the symmetric part and anti-symmetric part (with the latter having trace zero) we conclude that

for an arbitrary matrix . In particular

In the incompressible case , the second term vanishes, and this equation simply says that the shear stress is proportional to the (rate of) strain.

Inserting the above law back into (27) and then (15), (22), we obtain the *Navier-Stokes equations*

where is the spatial Laplacian. In the incompressible case, setting (this ratio is known as the *kinematic viscosity*), and also normalising for simplicity, this simplifies to the *incompressible Navier-Stokes equations*

Of course, the incompressible Euler equations arise as the special case when the viscosity is set to zero. For physical fluids, is positive, though it can be so small that the Euler equations serve as a good approximation. Negative values of are mathematically possible, but physically unrealistic for several reasons (for instance, the total energy of the fluid would increase over time, rather than dissipate over time) and also the equations become mathematically quite ill-behaved in this case (as they carry essentially all of the pathologies of the backwards heat equation).

**Exercise 3** In a constant gravity field oriented in the direction , each particle will experience an external gravitational force , where is a fixed constant. Argue that the incompressible Euler equations in the presence of such a gravitational field should be modified to

and the incompressible Navier-Stokes equations should similarly be modified to

- (i) Suppose that in an -particle system, the force between two particles is given by the conservative law
for some smooth potential function , which we assume to obey the symmetry to be consistent with Newton’s third law. Show that the total energy

is conserved in time (that is to say, the time derivative of this quantity vanishes). Here of course we use to denote the Euclidean magnitude of a vector .

- (ii) Assume furthermore that in the large limit, we have the velocity approximation (10), the stress approximation (21), and the isotropy condition (23). Assume all interactions are short-range. Assume we also have a potential energy approximation
for all test functions and some smooth function (usually called the specific internal energy). By analysing the energy distribution

in analogy with the previous analysis of the mass distribution and momentum distribution , make a heuristic derivation of the energy conservation law

where the

*energy density*is defined by the formula(that is to say, is the sum of the

*kinetic energy density*and the*internal energy density*). Conclude in particular that the total energyis formally conserved in time.

- (iii) In addition to the previous assumptions, assume that there is a functional relationship between the specific energy density and the density , in the sense that there is a smooth function such that
at all points in spacetime and all physically relevant configurations of particles. (In particular, we are in the compressible situation in which we allow the density to vary.) By dilating the position variables by a scaling factor , scaling the test function appropriately, and working out (via (9)) what the scaling does to the density function , generalise (30) to

Formally differentiate this approximation with respect to at and use (29), (20), (21), (23) to heuristically derive the equation of state

- (iv) Give an alternate derivation of the energy conservation law (31) from the equations (24), (25), (33), (34). (Note that as the fluid here is compressible, one cannot use any equations in this post that rely on an incompressibility hypothesis.)
- (v) Suppose the functional relationship (33) uses a function of the form
for some constant (the specific energy density at equilibrium) and some large ; physically, this represents a fluid which is nearly incompressible in the sense that it is very resistant to having its density deviate from . Assuming that the pressure stays bounded and is large, heuristically derive the approximations

and

Formally, in the limit , we thus heuristically recover an incompressible fluid with a constant specific energy density . In particular the contributions of the specific energy to the energy conservation law (31) may be ignored in the incompressible case thanks to (24).

**Remark 5** In the literature, the relationship between the functional relationship (33) and the equation of state (34) is usually derived instead using the laws of thermodynamics. However, as the above exercise demonstrates, it is also possible to recover this relationship from first principles. In the case of an (isoentropic) ideal gas, the laws of thermodynamics can be used to establish an equation of state of the form for some constants with , as well as the corresponding functional relationship , so that the internal energy density is . This is of course consistent with (33) and (34), after choosing appropriately.

**Exercise 6** Suppose one has a (compressible or incompressible) fluid obeying the velocity approximation (10), the stress approximation (21), the isotropy condition (23), and the torque free condition (4). Assume also that all interactions are short range. Derive the angular momentum equation

for in two different ways:

- (a) From a heuristic analysis of the angular momentum distribution
analogous to how the mass, momentum, and energy distributions were analysed previously; and

- (b) Directly from the system (24), (25).

(c) Conclude in particular that the total angular momentum

is formally conserved in time.

**Remark 7** In this set of notes we made the rather strong assumption (10) that the velocities of particles could be approximated by a smooth function of their time and position. In practice, most fluids will violate this hypothesis due to thermal fluctuations in the velocity. However, one can still proceed with a similar derivation assuming that the velocities behave *on the average* like a smooth function , in the sense that

for any test function . The approximation (13) now has to be replaced by the more general

where

is the deviation of the particle velocity from its mean. The second term in (18) now needs to be replaced by the more complicated expression

for any test function . This allows us to heuristically drop the cross terms from (36) involving a single factor of , and simplify this expression (up to negligible errors) as

Repeating the analysis after (18), one eventually arrives again at (19), except that one has to add an additional term

to the stress tensor of a single particle . However, this term is still symmetric, and one can still continue most of the heuristic analysis in this post after suitable adjustments to the various physical hypotheses (for instance, assuming some form of the molecular chaos hypothesis to be able to neglect some correlation terms between and other quantities). We leave the details to the interested reader.

### Benoist’s minicourse “Arithmeticity of discrete groups”. Lecture I: A survey on arithmetic groups

Last week, Jon Chaika, Jing Tao and I co-organized the Summer School on Teichmüller Theory and its Connections to Geometry, Topology and Dynamics at Fields Institute.

This activity was part of the Thematic Program on Teichmüller Theory and its Connections to Geometry, Topology and Dynamics, and it consisted of four excellent minicourses by Yves Benoist, Hee Oh, Giulio Tiozzo and Alex Wright.

These minicourses were fully recorded and the corresponding videos will be available at Fields Institute video archive in the near future.

Meanwhile, I decided to transcript my notes of Benoist’s minicourse in a series of four posts (corresponding to the four lectures delivered by him).

Today, we shall begin this series by discussing the statement of the main result of Benoist’s minicourse, namely:

**Theorem 1 (Oh, Benoist–Miquel)** *Let be a semisimple algebraic Lie group of real rank . Suppose that is a horospherical subgroup of , and assume that is a Zariski dense and irreducible subgroup of such that is cocompact. Then, there exists an arithmetic subgroup such that and are commensurable.*

The basic reference for the proof of this theorem (conjectured by Margulis) is the original article by Benoist and Miquel. This theorem completes the discussion in Hee Oh’s thesis where she dealt with many families of examples of semisimple Lie groups (as Hee Oh kindly pointed out to me, the reader can find more details about her contributions to Theorem 1 in these articles here).

**Remark 1** *I came across Benoist–Miquel theorem during my attempts to understand a question by Sarnak about the nature of Kontsevich–Zorich monodromies. In particular, I’m thankful to Yves Benoist for explaining in his minicourse the proof of a result that Pascal Hubert and I used as a black box in our recent preprint here.*

Below the fold, the reader will find my notes of the first lecture of Benoist’s minicourse (whose goal was simply to discuss several keywords in the statement of Theorem 1).

**1. Examples**

Let be a Lie group and consider a discrete subgroup.

**Definition 2**

- is a
**lattice**when , i.e., there exists such that and where is a right-invariant Haar measure on . - is
**cocompact**if is compact (i.e., the subset can be chosen compact).

**Example 1** * is (discrete and) cocompact in : indeed, for .*

In general, is cocompact is lattice. However, the converse is not true:

**Example 2** * is a lattice in which is not cocompact. In fact, the compact subsets of are described by the so-called Mahler’s compactness criterion asserting that is relatively compact if and only if .*

**Example 3 (Siegel)** *Let be a non-degenerate quadratic form in , , with for all . In this setting, is a lattice in .*

**Remark 2** *It is possible to prove that is cocompact if and only if doesn’t represent zero (i.e., ).**Nevertheless, this information is not very useful to produce cocompact lattices because it is possible (to use Hasse’s principle) to show that if and is not definite, then represents zero.*

- is a lattice in .
- can be viewed as a lattice in via the map where is Galois conjugation.

Historically, the basic idea of the previous example was adapted to produce the first examples of discrete *cocompact* subgroups of :

**Example 5** *Let . Then, can be viewed as a lattice in via the map .**Note that is definite and, hence, it doesn’t represent zero. Therefore, is a discrete cocompact subgroup of (thanks to Mahler’s compactness criterion).*

In the sequel, we will put all examples above in a single framework.

**2. Arithmetic groups**

Let be an algebraic subgroup (i.e., a subgroup described by the zeros of polynomial functions of the matrix entries of elements of ).

**Definition 3**

- is
**simple**if its Lie algebra is simple in the sense that its ideals are trivial. - is
**semi-simple**if where are simple ideals.

If is semi-simple, the adjoint map has finite kernel and finite index image.

In other terms, if is semi-simple, then equals to the group of matrices up to finite index. In particular, the adjoint map of a semi-simple group allows to replace the “extrinsic” algebraic structure by the “intrinsic” algebraic structure modulo finite index.

**Definition 4** *A -form of is the choice of -vector subspace of such that*

- is a Lie subalgebra;
- .

*In other words, a -form of is a choice of basis where the Lie bracket is described by a matrix with rational coefficients.*

**Definition 5** *An arithmetic subgroup of is for some choice of .*

It is possible to check that Examples 2, 3, 4, 5 above describe arithmetic subgroups of semisimple Lie groups . In particular, the fact that these examples provide lattices in can be viewed as concrete applications of Borel–Harish-Chandra theorem:

**Theorem 6 (Borel–Harish-Chandra)** *An arithmetic subgroup of an algebraic semisimple group is a lattice.*

**Remark 3** *One can prove that is cocompact if and only if doesn’t contain nilpotent elements, i.e., there is no such that the matrix is nilpotent.*

**Remark 4** *This theorem naturally leads to the question of the existence of non-arithmetic lattices. As it turns out, this question is answered by the so-called Margulis arithmeticity theorem.*

Let us now pursue the discussion of the statement of Theorem 1 by introducing the notion of irreducible subgroups.

**Definition 7** *Let be an algebraic semisimple group with Lie algebra , where are simple ideals. A discrete subgroup is irreducible if*

*is finite for all .*

**Remark 5** *Any is irreducible when is simple.*

Finally, let us close this section by noticing that Theorem 1 is a sort of “converse” to the so-called Borel density theorem:

**Theorem 8 (Borel)** *Let be a connected semisimple algebraic group with Lie algebra , where are simple ideals. Assume that none of the factors of (associated to ‘s) is compact. Then, any lattice is Zariski dense (i.e., is not included in a proper algebraic subgroup of ).*

**3. Horospherical group**

The last bit of information needed to understand the statement of Theorem 1 is the concept of horospherical group. In the literature, this notion is usually phrased in terms of unipotent radicals of parabolic subgroups. For the sake of exposition, we will give an alternative elementary definition of this notion.

**Definition 9** *Let be a semisimple group with neutral element .*

- an element is
**unipotent**whenever such that ; - a non-trivial subgroup is
**horospherical**when such that

**Example 6** *Let , , and where and is the identity matrix.*Note that if , then . Hence,

*is a horospherical subgroup.*

As it was shown by Kazhdan and Margulis, the cocompactness of lattices is detected by horospherical groups:

**Theorem 10 (Kazhdan–Margulis)** *Let be a lattice in a semisimple Lie group . Then, is not cocompact contains unipotent elements horospherical such that is cocompact in .*

This completes our discussion of the statement of Theorem 1. Next time, we will start the proof of Theorem 1 in the particular case of Example 6, i.e., , , and .

### Trying to understand the Galois correspondence

Let be a field, and let be a finite extension of that field; in this post we will denote such a relationship by . We say that is a Galois extension of if the cardinality of the automorphism group of fixing is as large as it can be, namely the degree of the extension. In that case, we call the Galois group of over and denote it also by . The fundamental theorem of Galois theory then gives a one-to-one correspondence (also known as the *Galois correspondence*) between the intermediate extensions between and and the subgroups of :

**Theorem 1 (Fundamental theorem of Galois theory)** Let be a Galois extension of .

- (i) If is an intermediate field betwen and , then is a Galois extension of , and is a subgroup of .
- (ii) Conversely, if is a subgroup of , then there is a unique intermediate field such that ; namely is the set of elements of that are fixed by .
- (iii) If and , then if and only if is a subgroup of .
- (iv) If is an intermediate field between and , then is a Galois extension of if and only if is a normal subgroup of . In that case, is isomorphic to the quotient group .

**Example 2** Let , and let be the degree Galois extension formed by adjoining a primitive root of unity (that is to say, is the cyclotomic field of order ). Then is isomorphic to the multiplicative cyclic group (the invertible elements of the ring ). Amongst the intermediate fields, one has the cyclotomic fields of the form where divides ; they are also Galois extensions, with isomorphic to and isomorphic to the elements of such that modulo . (There can also be other intermediate fields, corresponding to other subgroups of .)

**Example 3** Let be the field of rational functions of one indeterminate with complex coefficients, and let be the field formed by adjoining an root to , thus . Then is a degree Galois extension of with Galois group isomorphic to (with an element corresponding to the field automorphism of that sends to ). The intermediate fields are of the form where divides ; they are also Galois extensions, with isomorphic to and isomorphic to the multiples of in .

There is an analogous Galois correspondence in the covering theory of manifolds. For simplicity we restrict attention to finite covers. If is a connected manifold and is a finite covering map of by another connected manifold , we denote this relationship by . (Later on we will change our function notations slightly and write in place of the more traditional , and similarly for the deck transformations below; more on this below the fold.) If , we can define to be the group of deck transformations: continuous maps which preserve the fibres of . We say that this covering map is a *Galois cover* if the cardinality of the group is as large as it can be. In that case we call the *Galois group* of over and denote it by .

Suppose is a finite cover of . An *intermediate cover* between and is a cover of by , such that , in such a way that the covering maps are compatible, in the sense that is the composition of and . This sort of compatibilty condition will be implicitly assumed whenever we chain together multiple instances of the notation. Two intermediate covers are *equivalent* if they cover each other, in a fashion compatible with all the other covering maps, thus and . We then have the analogous Galois correspondence:

**Theorem 4 (Fundamental theorem of covering spaces)** Let be a Galois covering.

- (i) If is an intermediate cover betwen and , then is a Galois extension of , and is a subgroup of .
- (ii) Conversely, if is a subgroup of , then there is a intermediate cover , unique up to equivalence, such that .
- (iii) If and , then if and only if is a subgroup of .
- (iv) If , then is a Galois cover of if and only if is a normal subgroup of . In that case, is isomorphic to the quotient group .

**Example 5** Let , and let be the -fold cover of with covering map . Then is a Galois cover of , and is isomorphic to the cyclic group . The intermediate covers are (up to equivalence) of the form with covering map where divides ; they are also Galois covers, with isomorphic to and isomorphic to the multiples of in .

Given the strong similarity between the two theorems, it is natural to ask if there is some more concrete connection between Galois theory and the theory of finite covers.

In one direction, if the manifolds have an algebraic structure (or a complex structure), then one can relate covering spaces to field extensions by considering the field of rational functions (or meromorphic functions) on the space. For instance, if and is the coordinate on , one can consider the field of rational functions on ; the -fold cover with coordinate from Example 5 similarly has a field of rational functions. The covering relates the two coordinates by the relation , at which point one sees that the rational functions on are a degree extension of that of (formed by adjoining the root of unity to ). In this way we see that Example 5 is in fact closely related to Example 3.

**Exercise 6** What happens if one uses meromorphic functions in place of rational functions in the above example? (To answer this question, I found it convenient to use a discrete Fourier transform associated to the multiplicative action of the roots of unity on to decompose the meromorphic functions on as a linear combination of functions invariant under this action, times a power of the coordinate for .)

I was curious however about the reverse direction. Starting with some field extensions , is it is possible to create manifold like spaces associated to these fields in such a fashion that (say) behaves like a “covering space” to with a group of deck transformations isomorphic to , so that the Galois correspondences agree? Also, given how the notion of a path (and associated concepts such as loops, monodromy and the fundamental group) play a prominent role in the theory of covering spaces, can spaces such as or also come with a notion of a path that is somehow compatible with the Galois correspondence?

The standard answer from modern algebraic geometry (as articulated for instance in this nice MathOverflow answer by Minhyong Kim) is to set equal to the spectrum of the field . As a set, the spectrum of a commutative ring is defined as the set of prime ideals of . Generally speaking, the map that maps a commutative ring to its spectrum tends to act like an inverse of the operation that maps a space to a ring of functions on that space. For instance, if one considers the commutative ring of regular functions on , then each point in gives rise to the prime ideal , and one can check that these are the only such prime ideals (other than the zero ideal ), giving an almost one-to-one correspondence between and . (The zero ideal corresponds instead to the generic point of .)

Of course, the spectrum of a field such as is just a point, as the zero ideal is the only prime ideal. Naively, it would then seem that there is not enough space inside such a point to support a rich enough structure of paths to recover the Galois theory of this field. In modern algebraic geometry, one addresses this issue by considering not just the set-theoretic elements of , but more general “base points” that map from some other (affine) scheme to (one could also consider non-affine base points of course). One has to rework many of the fundamentals of the subject to accommodate this “relative point of view“, for instance replacing the usual notion of topology with an étale topology, but once one does so one obtains a very satisfactory theory.

As an exercise, I set myself the task of trying to interpret Galois theory as an analogue of covering space theory in a more classical fashion, without explicit reference to more modern concepts such as schemes, spectra, or étale topology. After some experimentation, I found a reasonably satisfactory way to do so as follows. The space that one associates with in this classical perspective is not the single point , but instead the much larger space consisting of ring homomorphisms from to arbitrary integral domains ; informally, consists of all the “models” or “representations” of (in the spirit of this previous blog post). (There is a technical set-theoretic issue here because the class of integral domains is a proper class, so that will also be a proper class; I will completely ignore such technicalities in this post.) We view each such homomorphism as a single point in . The analogous notion of a path from one point to another is then a homomorphism of integral domains, such that is the composition of with . Note that every prime ideal in the spectrum of a commutative ring gives rise to a point in the space defined here, namely the quotient map to the ring , which is an integral domain because is prime. So one can think of as being a distinguished subset of ; alternatively, one can think of as a sort of “penumbra” surrounding . In particular, when is a field, defines a special point in , namely the identity homomorphism .

Below the fold I would like to record this interpretation of Galois theory, by first revisiting the theory of covering spaces using paths as the basic building block, and then adapting that theory to the theory of field extensions using the spaces indicated above. This is not too far from the usual scheme-theoretic way of phrasing the connection between the two topics (basically I have replaced étale-type points with more classical points ), but I had not seen it explicitly articulated before, so I am recording it here for my own benefit and for any other readers who may be interested.

** — 1. Some notation on functions — **

It will be convenient to adopt notation in which many of the basic hypotheses we use take the form of an associativity-type law. To this end, we will use two non-standard notations for functions (which is implicit in the usual notation for left and right group actions), sometimes referred to as reverse Polish and Polish notation (for left and right actions respectively).

**Definition 7** (Polish and reverse Polish notation) Let be sets. A *function acting on the right* is a function in which the notation for evaluating at an element is denoted rather than the more standard . If are two functions acting on the right, we write for the composition that is more commonly denoted ; this way we have the associativity-type property

Similarly, a *function acting on the left* is a function in which the notation for evaluating at an element is denoted rather than the more standard . If are two functions acting on the left, we write for the composition that is more commonly denoted ; this way we have the associativity type property

for . We do not define a composition between a function acting on the right and a function acting on the left.

**Remark 8** Functions acting on the left largely correspond to the traditional notion of a function, despite the reversed arrow in the notation ; functions acting on the right can be thought of as “antifunctions”, in which the composition law is reversed.

As a general rule, in this post covering maps and deck transformations will be functions acting on the left, while paths will be functions acting on the right.

When viewing the (left or right) automorphisms of a set as a group, we use juxtaposition rather than composition as the group law; thus, in the case of automorphisms acting on the right, the group structure we will use is the opposite of the usual composition group structure, whereas for automorphisms acting on the left the group structure agrees with the standard composition group structure. In particular, the group of right-automorphisms of a space is canonically identified with the opposite group of the group of left-automorphisms of a space . (Of course, the opposite of a group is in turn canonically identified with the group itself through the inversion operation , but we will refrain from using this further identification in this text as it can cause some additional confusion.)

As is customary, we will adopt the notational convention of omitting parentheses whenever there is an associativity-type law to prevent any ambiguity. For instance, if and , are functions acting on the right, we may write without any ambiguity thanks to (1).

** — 2. Covers of manifolds — **

We begin by revisiting the theory of covering spaces of manifolds. To align with the way we will be thinking about Galois theory, it will be convenient to assume the existence of a (connected) base manifold , such that all the other manifolds we will be considering are finite extensions of that base . Again, for simplicity (and to make the theory more closely resemble Galois theory) we will restrict attention to finite covers, though one could extend most of this discussion to infinite covers without much difficulty.

One can think of the base manifold as a category, in which the objects are the points of the manifold, and the morphisms are the paths in connecting one point to another , with the composition of two paths (connecting to ) with (connecting to ) being a path connecting with formed by concatenating the two paths together. We leave the composition undefined if the terminal point of does not agree with the starting point of . We will not distinguish a path from a reparameterisation of that path (in fact we will not mention parameterisations at all); in particular, we have the associativity law

We denote the space of all paths (up to reparameterisations) by . One should think of as being somewhat like a group, except that the multiplication law is not always defined. (More precisely, is not just a category, but is in fact a groupoid, although we will not explicitly use this fact here.)

To signify the fact that a path has starting point and terminal point , we write

If is not the starting point of , we leave undefined. Note this way that we obtain the associativity-type law

whenever one of the two sides is well-defined (which forces the other side to be well-defined also).

Now let be a finite cover with covering map ; here all covering maps are understood to act on the left. Given a point lying above a base point (thus ), and a path with starting point , we may lift up to a path in that starts at and ends at some point which we will denote by . This point will lie above in , thus we have the associativity-type law

whenever one of the two sides (and hence the other) is well-defined. It is easy to see that this defines an (right) action of in the sense that one has the associativity type law

whenever and are such that one of the two sides (and hence the other) is well-defined.

Note that if is connected and , then any other point can be connected to by a path in , which is the lift of some path in . Thus the action of is transitive in the sense that for every there exists such that .

If we have finite covers then the actions of on and are compatible in the sense that one has the associativity-type law

whenever one of the two sides (and hence the other) is well-defined. A deck transformation of (viewed as a cover of ) is a continuous map (acting on the left) such that

The space of deck transformations is clearly a group that acts on the left on . By working locally we see that deck transformations map lifts to lifts, thus one has the associativity-type law

whenever , , are such that one of the two sides (and hence the other) is well-defined. Conversely, one can check that any map that obeys the two properties (2), (3) is a deck transformation (even without assuming continuity of !). Hence one can in fact take (2), (3) to be the *definition* of a deck transformation; all the topological structure is now concealed within the path groupoid and its actions on and .

One consequence of (3) is that the action of this group is free: if is such that for some point , then we also have whenever makes sense, and hence by transitivity for all , so is the identity.

In particular, if is a degree cover of (so that all fibres for have cardinality ), then the order of the group cannot exceed , since the orbit of any given point under this free action lies in a fibre of above and has the same cardinality as . If the order equals , we say that the cover is a *Galois cover*, use to denote the group , and conclude that the action of is transitive on fibres. Thus, if are such that , then there exists a unique such that .

We can now prove Theorem 4. Let be a degree cover. Let be an intermediate cover between and , and suppose that has degree over . A fibre of over has degree , and splits into fibres of over , each of which has cardinality (so in particular divides ). Pick one of these fibres, and pick a point in it. By the above discussion, there are precisely elements of such that lies in the same fibre over as :

Applying paths in on the right and using transitivity (and associativity), we conclude that , thus . As we have located elements of this group, this is in fact the entire group of automorphisms, so we conclude that is a Galois cover of and is a subgroup of . This proves part (i) of the theorem.

A similar argument shows that if then is a subgroup of . In particular, if and are equivalent then . Conversely, suppose that and are such that is a subgroup of . If are such that , then by the Galois nature of over we can find such that . But then , hence . Thus we can factor for some uniquely defined function . It is not hard to verify that is a finite covering map that is compatible with the existing covering maps, and hence . This proves (iii). Reversing the roles of and , we conclude in particular that if then and are equivalent. This gives the “unique up to equivalence” portion of (ii).

To finish the proof of (ii), let be a subgroup of , then we can define an equivalence relation on by declaring for every and . If we set to be the quotient space with the obvious quotient map , then one has whenever (which is equivalent to ). Thus we may factor

for some unique map . One may easily verify that are both covering maps, so that we have . To finish the proof of (Ii) we need to show that . Unpacking the definitions, our goal is to show that is the set of all such that for all . Clearly if then for all . Conversely, suppose that is such that for all . In particular, if one fixed a point , then for some . This implies that for all ; but the action of on is transitive, and hence , giving the claim.

Finally, we prove (iv). By parts (i), (ii), we may assume without loss of generality that , where if for some .

First suppose that is a normal subgroup of . Then implies whenever and . Thus the action of on descends to that of ; since the former is transitive on fibres above , so is the latter. Hence is a Galois extension of . Conversely, suppose that is Galois. Then for any and , there must exist such that . Since the right action of on is transitive, does not actually depend on , and is uniquely specified by . Replacing by for any , we have

but , hence

or equivalently . This implies that , thus is preserved by conjugation by arbitrary elements of . In other words, is a normal subgroup. The map described here can easily be verified to be a homomorphism from to with kernel , so the claim (iv) follows from the first isomorphism theorem.

**Remark 9** Let be a Galois cover, let be a point in the base , and let be a point in the fibre . For any loop starting and ending at (so that , the point lies in the same fibre of as , hence by the Galois nature of the cover there is a unique such that

It is easy to see that homotopic deformations of (preserving the initial and final point ) do not affect the value of . Thus, we have constructed a map from the fundamental group of at to the Galois group , such that

As is connected, we see that this map is surjective. From (4) and the fact that that acts freely on ) we also see that the map is a homomorphism. (In many texts, the notation is set up so that this correspondence is an antihomomorphism rather than a homomorphism; we have avoided this by making the paths act on the right and the deck transformations act on the left, thus implicitly introducing an order reversal when identifying the two.) Thus we see that the Galois group is isomorphic to a quotient of the fundamental group . However, this isomorphism is not canonical, even if one fixes the base point , because it depends on ! If one replaces by another point in the same fibre for some , then the associated map , defined by

has to be conjugate to in order to remain compatible with (4):

Thus, if one does not specify the reference point in the covering space , the identification of with a quotient of is only unique up to conjugation. Thus the relationship between fundamental groups and Galois groups are a little subtle, requiring one to be aware of what base points have been selected, unless one is willing to just work up to conjugacy. (See also the above-mentioned post of Kim for some further discussion of this point.)

** — 3. Extensions of fields — **

We can now give an analogous treatment of the Galois correspondence for finite extensions of fields.

In this setting, we will take the base space to be the category of integral domains (with morphisms now interpreted as “paths”). Thus, points in this space are integral domains, and the paths in this space are ring homomorphisms from one integral domain to another, which we will think of as functions acting on the right, in particular the composition of and will be denoted (rather than the more traditional ). As before, we use the notation to denote the assertion that is a ring homomorphism from to , so in particular we have the associativity type law

whenever either side is well-defined. As previously mentioned, we will ignore all set-theoretic issues caused by the fact that is a proper class rather than a set, and similarly for the covers of introduced below. We let denote the collection of all ring homomorphisms connecting points in . This is analogous to the space of collections of paths in the base manifold in the covering space context. Oen caveat however is that while paths in manifolds always have inverses, the same is not true in , because not every ring homomorphism between integral domains is invertible. This complicates the situation somewhat compared to the covering space case, but it turns out that the additional level of complication is manageable.

Given a field , we define an associated space , whose points are ring homomorphisms to some ring , where we view these homomorphism as acting on the right. There is an obvious map (viewed as a function acting on the left) that maps a ring homomorphism to its codomain . If , are such that and , then the composition is a ring homomorphism from to and is thus also an element of . Thus acts on the right on in the sense that

whenever either side is well-defined, and the action is compatible with the base in the sense that

whenever either side is well-defined.

**Remark 10** One can define for any commutative ring in the same fashion. If one takes to be the integers , then is in one-to-one correspondence with , since for every integral domain there is a unique ring homomorphism from to . Thus we are really thinking of all our spaces as being covers of . In the usual scheme-theoretic language, one views the schemes as lying above the base scheme .

**Remark 11** The characteristic of the field naturally restricts the base points that actually lies above. For instance, if has characteristic zero, then points of only lie above those base points for which the unit of has infinite order; similarly, if has a positive characteristic , then points of only lie above base points for which the unit of has order . This helps explain why fields of different characteristic seem to inhabit “different worlds” – they lie above disjoint (and disconnected) portions of the base space ! However, we will not exploit the field characteristic in this post.

As mentioned in the introduction, each space has a distinguished point , which is the identity map on and is the representative in this formalism of . This point makes is “directionally connected” in the following sense: every other point in is connected to by a path in the sense that . Indeed, one just takes to be precisely the same homomorphism as . (More generally, in every commutative ring , every point of will be connected via a path from a point in , namely the kernel of which is necessarily a prime ideal since is an integral domain. I like to think of as a sort of “penumbra” lurking around the much smaller set .) However, the connectedness does not flow the other way: not every point in is connected back to by a path, because paths need not be invertible. (For instance, if is an embedding of into a larger field , there is no way to map back into by a ring homomorphism while preserving , as homomorphisms of fields have to be injective.)

Suppose we have a finite extension of fields, then the inclusion map , viewed as a function acting on the right, is a ring homomorphism. As such, it also can be viewed as a map from to : any point of , when composed with , gives rise to a point of . We relabel this composition map as (now viewed as a function acting on the left), thus

for all . By construction we see that

and similarly for any nested sequence of fields one has

Now let us look at the fibres of above . We need the following basic result, which relates the degree of the field extension to the degree of the cover :

**Proposition 12** Let be a degree extension of a field . Then every fibre in of a point in has cardinality at most .

*Proof:* Unpacking all the definitions, we are asking to show that every ring homomorphism into an integral domain has at most extensions to a ring homomorphism .

We induct on . If the claim is trivial, so suppose and the claim has already been proven for smaller values of . Then is larger than , so we may find an element that lies in but not . Let be the degree of , thus , is a degree extension of and is a degree extension of (in particular, divides ). By induction hypothesis, every ring homomorphism on has at most extensions to , so it suffices to show that has at most extensions to .

Let be the minimal polynomial of with coefficients in , thus is monic with degree , and . Any extension of to must obey the law , where is the monic polynomial of degree with coefficients in formed by applying to each coefficient of . As is an integral domain, has at most roots. Thus there are at most possible values of ; since is completely determined by and , we obtain the claim.

**Example 13** Let and be the fields from Example 3. The inclusion map is a point in . The fibre consists of points , with being the field automorphism of that sends to (in particular, is the distinguished point of ). In contrast, the distinguished point of has empty fibre , since there is no ring homomorphism from to that fixes .

**Remark 14** If two points in are connected by an invertible path , thus and , then it is easy to see that induces a one-to-one correspondence between the fibre and the fibre ; in particular, the two have the same cardinality. However, in contrast to the situation with covering spaces, not all paths are invertible, and so the cardinality of a fibre can vary with the base , as can already be seen in the preceding example. However the situation is better for Galois extensions, as we shall shortly see, in which all non-empty fibres are isomorphic.

We continue analysing a field extension , viewing as analogous to a covering space for . By analogy with covering spaces, we could define a “deck transformation” of over to be a map (acting on the left) such that

whenever , are such that one side (and hence the other) of the above equation is well-defined. Let us see what a deck transformation actually is. If we apply to the distinguished point of , we obtain a ring homomorphism . From (5) we have

which on chasing definitions means that is the identity on . Thus is a -linear map; it also preserves , so it has trivial kernel and is thus injective. As is finite dimensional over , it is also surjective; hence is in fact a field automorphism of that fixes . The action of on any other point of can then be computed by writing where is the path connecting to and using (6) to conclude that . Conversely, given any field automorphism that fixes , one can generate a deck transformation by setting for all ring homomorphisms ; by chasing definitions one verifies that this is a deck transformation. Thus we have a one-to-one correspondence between deck transformations and field automorphisms of fixing . (This correspondence feels reminiscent of the Yoneda lemma, though I was not able to find a direct link between the two.)

**Exercise 15** Show that the correspondence between deck transformations and field automorphisms given above is a group isomorphism. (When functions are expressed in more traditional language, this correspondence is an antiisomorphism rather than an isomorphism. We are able to avoid this reversal of order by making deck transformations act on the left and field automorphisms act on the right.)

One consequence of the above correspondence is that the space of deck transformations of over is in one-to-one correspondence with the group of field automorphisms of fixing . If is an intermediate field, then certainly any deck transformation of over is also a deck transformation of over , so is a subgroup of .

From the above discussion, we see that a deck transformation of over is completely determined by its action on the distinguished point . In particular, the action of is free on the distinguished point of in the sense that the points for are all distinct. As they also lie in the same fibre of above , we conclude from Proposition 12 that the order of cannot exceed the degree of the extension:

As stated in the introduction, we call a Galois extension of if we in fact have equality

and then we rename as . This is canonically equivalent to the usual Galois group over fields. Thus, if is a degree Galois extension of , then implies that the fibre containing above must have cardinality exactly , with acting freely and transitively on this fibre. In fact the same is true for every non-empty fibre:

**Lemma 16** Let be a degree Galois extension of , and let be a point in . Then the fibre containing above has cardinality exactly , and acts freely and transitively on this fibre. In other words, whenever are such that , then there is a unique such that .

*Proof:* As is a field and is an integral domain, the ring homomorphism must be injective (it maps invertible elements to invertible elements, and hence maps non-zero elements to non-zero elements). Each Galois group element corresponds to a different field automorphism of , hence the homomorphisms , are all distinct. This produces distinct elements of the fibre of ; by Proposition 12 this is the entire fibre. The claim follows.

Thanks to this lemma, we see that behaves like a degree covering map on some subset of (the space of ring homomorphisms for which is “large enough” that there is at least one extension of to ).

We can now prove part (i) of Theorem 1, in analogy with the covering space theory argument:

**Proposition 17** If is a Galois extension of , and is an intermediate field, then is a Galois extension of .

Note that we have already demonstrated that is a subgroup of , so this proposition completes the proof of Theorem 1(i).

*Proof:* Let denote the degree of over , and the degree of over , thus is a degree extension of . Consider a non-empty fibre of above . By Lemma 16, this fibre consists of points. It also can be decomposed into fibres of above , indexed by a fibre of above . By Lemma 16, the latter fibre consists of points, while by Proposition 12, the former fibres all have cardinality at most . Hence all the former fibres must have cardinality exactly . In particular, the fibre of above has cardinality . This fibre consists of those ring homomorphisms that fix . As discussed previously, these ring homomorphisms must be field automorphisms, and thus generate elements of (or . Thus has cardinality at least , and hence by (7) it has cardinality exactly . Thus is a Galois extension of as required.

*Proof:* By definition, every element of is fixed by , thus and hence

On the other hand, by Proposition 17 we have . Finally, from (7) we have

Therefore all inequalities must be equalities, and as claimed.

Now we prove Theorem 1(ii). Let be a subgroup of , which we can also identify with a subgroup of . Let denote the set of elements of that are fixed by ; this is clearly an intermediate field between and :

On the one hand, every element of can be identified with an element of . By Proposition 17, has order exactly equal to . Thus

To prove the reverse inequality, we use an argument of Artin. We first need a simple lemma:

**Lemma 19 (Producing an invariant vector)** Let the notation be as above. Let be the -fold Cartesian product of (viewed as a vector space over ), and let act componentwise on :

Let be an -invariant subspace of (i.e., is a vector space over such that whenever and ). If contains a non-zero element of , then it also contains a non-zero element of .

*Proof:* Let be a non-zero element of with a minimal number of non-zero entries. By relabeling, we may assume that is non-zero; by dividing by , we may normalise . If all the other coefficients lie in then we are done; by relabeling we may thus assume without loss of generality that . By definition of , we may thus find such that . The vector is then a non-zero element of with fewer non-zero entries than , contradicting the minimality of , and the claim follows.

**Remark 20** This argument is reminiscent of the abstract ergodic theorem from this previous blog post, except that now we minimise the norm rather than the norm, since the latter is unavailable over arbitrary fields . While this lemma is simple, it seems to be the one aspect of the Galois correspondence for field extensions that does not seem to have an obvious analogue in the covering space setting; on the other hand, I do not know how to prove the fundamental theorem of Galois theory without this lemma or something like it (though presumably the original arguments of Galois proceed differently).

**Corollary 21** We have .

*Proof:* Suppose for contradiction that , thus we may find elements of that are linearly independent over . We form a vector out of these elements, and consider the orbit of this vector. Let be the space of vectors orthogonal to this orbit, where we use the usual dot product

in . This space has codimension at most in and thus has a non-zero vector since . Clearly is -invariant. By Lemma 19, contains a non-zero vector whose entries lie in . By construction, is orthogonal to , thus there is a non-trivial linear relation amongst the with coefficients in , contradicting linear independence.

We thus have , giving the existence part of Theorem 1(ii); the uniqueness part is immediate from Corollary 18.

Theorem 1(iii) is also immediate from Corollary 18, so we turn to Theorem 1(iv). Let . First suppose that is a normal subgroup of . If and , then fixes , thus fixes . Thus , hence by Corollary 17 we have . Thus the action of descends to that of ; equivalently, every deck transformation of above descends to a deck transformation of above . Since the fibre of above has cardinality , and splits into fibres of cardinality above indexed by a fibre of above that has cardinality at most by Proposition 12, and , we conclude that the latter fibre must have cardinality exactly . As acts transitively on the fibre of , it must also act transitively on this fibre of . Thus there at least distinct deck transformations of above , and so is a Galois extension of thanks to (7), and every deck transformation of above arises from a deck transformation of above . Thus we have constructed a surjective map from to which can easily be verified to be a homomorphism. The kernel of this homomorphism consists of all deck transformations of above that preserve the fibres of above ; applying the deck transformation to the distinguished point , one sees in particular that the field automorphism of associated to this transformation preserves , and hence this transformation lies in . Conversely every deck transformation in descends to the identity transformation on . Thus the kernel of the homomorphism is precisely , and the isomorphism of with then follows from the first isomorphism theorem.

Conversely, suppose that is a Galois extension of . Let and . The points , of lie above the same point of ; from Lemma 16 there thus exists such that

On the other hand, as is the distinguished point of , there exists such that . Applying on the right to the above equation, we thus have

But lie above the same point of , thus , and hence

By Lemma 16 again, this implies that for some . As connects to every other point in by a path in , we conclude that , thus . In other words, is a normal subgroup of .

### Iozzi and Duke conferences updates

Just a quick update to indicate that the official FIM pages for the conferences in honor of Alessandra Iozzi’s birthday and Bill Duke’s birthday (both at FIM next year) are now available. Most important are the forms for young researchers to request funding (local expenses) to attend the conferences, here and there. (I was almost going to say to be careful not to apply to the wrong conference, but both will be great, so it doesn’t really matter…)

### QED version 2.0: an interactive text in first-order logic

As readers who have followed my previous post will know, I have been spending the last few weeks extending my previous interactive text on propositional logic (entitied “QED”) to also cover first-order logic. The text has now reached what seems to be a stable form, with a complete set of deductive rules for first-order logic with equality, and no major bugs as far as I can tell (apart from one weird visual bug I can’t eradicate, in that some graphics elements can occasionally temporarily disappear when one clicks on an item). So it will likely not change much going forward.

I feel though that there could be more that could be done with this sort of framework (e.g., improved GUI, modification to other logics, developing the ability to write one’s own texts and libraries, exploring mathematical theories such as Peano arithmetic, etc.). But writing this text (particularly the first-order logic sections) has brought me close to the limit of my programming ability, as the number of bugs introduced with each new feature implemented has begun to grow at an alarming rate. I would like to repackage the code so that it can be re-used by more adept programmers for further possible applications, though I have never done something like this before and would appreciate advice on how to do so. The code is already available under a Creative Commons licence, but I am not sure how readable and modifiable it will be to others currently. [*Update*: it is now on GitHub.]

[One thing I noticed is that I would probably have to make more of a decoupling between the GUI elements, the underlying logical elements, and the interactive text. For instance, at some point I made the decision (convenient at the time) to use some GUI elements to store some of the state variables of the text, e.g. the exercise buttons are currently storing the status of what exercises are unlocked or not. This is presumably not an example of good programming practice, though it would be relatively easy to fix. More seriously, due to my inability to come up with a good general-purpose matching algorithm (or even specification of such an algorithm) for the the laws of first-order logic, many of the laws have to be hard-coded into the matching routine, so one cannot currently remove them from the text. It may well be that the best thing to do in fact is to rework the entire codebase from scratch using more professional software design methods.]

[*Update*, Aug 23: links moved to GitHub version.]

### The most valuable mathematical restaurant cards in the world!

Now that Akshay Venkatesh has (deservedly) received the Fields Medal, I find myself the owner of some priceless items of mathematical history: the four restaurant cards on which, some time in (probably) 2005, Akshay sketched the argument (based on Ratner theory) that proves that the Fourier coefficients of a cusp form at and at (say) , for a *non-arithmetic* group, do not correlate. In other words, if we normalize the coefficients (say ) so that the mean-square is , then we have

(Incidentally, the great *persifleur* of the world was also present that week in Bristol, if I remember correctly).

The story of these cards actually starts the year before in Montréal, where I participated in May in a workshop on Spectral Theory and Automorphic Forms, organized by D. Jakobson and Y. Petridis (which, incidentally, remains one of the very best, if not the best, conference that I ever attended, as the programme can suggest). There, Akshay talked about his beautiful proof (with Lindenstrauss) of the existence of cusp forms, and I remember that a few other speakers mentioned some of his ideas (one was A. Booker).

In any case, during my own lecture, I mentioned the question. The motivation is an undeservedly little known gem of analytic number theory: Duke and Iwaniec proved in 1990 that a similar non-correlation holds for Fourier coefficients of *half-integral weight* modular forms, a fact that is of course related to the non-existence of Hecke operators in that context. Since it is known that this non-existence is also a property of non-arithmetic groups (in fact, a characteristic one, by the arithmeticity theorem of Margulis), one should expect the non-correlation to hold also for that case. This is what Akshay told me during a later coffee break. But only during our next meeting in Bristol did he explain to me how it worked.

Note that this doesn’t quite give as much as Duke-Iwaniec: because the ergodic method only gives the existence of the limit, and no decay rate, we cannot currently (for instance) deduce a power-saving estimate for the sum of over primes (which is what Duke and Iwaniec deduced from their own, quantitative, bounds; the point is that a similar estimate, for a Hecke form, would imply a zero-free strip for its -function).

For a detailed write-up of Akshay’s argument, see this short note; if you want to go to the historic restaurant where the cards were written, here is the reverse of one of them:

Restaurant cardIf you want to make an offer for these invaluable objects, please refer to my lawyer.

### Birkar, Figalli, Scholze, Venkatesh

Every four years at the International Congress of Mathematicians (ICM), the Fields Medal laureates are announced. Today, at the 2018 ICM in Rio de Janeiro, it was announced that the Fields Medal was awarded to Caucher Birkar, Alessio Figalli, Peter Scholze, and Akshay Venkatesh.

After the two previous congresses in 2010 and 2014, I wrote blog posts describing some of the work of each of the winners. This time, though, I happened to be a member of the Fields Medal selection committee, and as such had access to a large number of confidential letters and discussions about the candidates with the other committee members; in order to have the opinions and discussion as candid as possible, it was explicitly understood that these communications would not be publicly disclosed. Because of this, I will unfortunately not be able to express much of a comment or opinion on the candidates or the process as an individual (as opposed to a joint statement of the committee). I can refer you instead to the formal citations of the laureates (which, as a committee member, I was involved in crafting, and then signing off on), or the profiles of the laureates by Quanta magazine; see also the short biographical videos of the laureates by the Simons Foundation that accompanied the formal announcements of the winners. I am sure, though, that there will be plenty of other mathematicians who will be able to present the work of each of the medalists (for instance, there was a *laudatio* given at the ICM for each of the winners, which should eventually be made available at this link).

I know that there is a substantial amount of interest in finding out more about the inner workings of the Fields Medal selection process. For the reasons stated above, I as an individual will unfortunately be unable to answer any questions about this process (e.g., I cannot reveal any information about other nominees, or of any comparisons between any two candidates or nominees). I think I can safely express the following two personal opinions though. Firstly, while I have served on many prize committees in the past, the process for the Fields Medal committee was by far the most thorough and deliberate of any I have been part of, and I for one learned an astonishing amount about the mathematical work of all of the shortlisted nominees, which was an absolutely essential component of the deliberations, in particular giving the discussions a context which would have been very difficult to obtain for an individual mathematician not in possession of all the confidential letters, presentations, and other information available to the committee (in particular, some of my preconceived impressions about the nominees going into the process had to be corrected in light of this more complete information). Secondly, I believe the four medalists are all extremely deserving recipients of the prize, and I fully stand by the decision of the committee to award the Fields medals this year to these four.

I’ll leave the comments to this post open for anyone who wishes to discuss the work of the medalists. But, for the reasons above, I will not participate in the discussion myself.

*[Edit, Aug 1: looks like the ICM site is (barely) up and running now, so links have been added. At this time of writing, there does not seem to be an online announcement of the composition of the committee, but this should appear in due course. -T.]*

*[Edit, Aug 9: the composition of the Fields Medal Committee for 2018 (which included myself) can be found here. -T.]*