The way that most physicists teach and talk about partial differential equations is horrible, and has surprisingly big costs for the typical understanding of the foundations of the field even among professionals. The chief victims are students of thermodynamics and analytical mechanics, and I’ve mentioned before that the preface of Sussman and Wisdom’s *Structure and Interpretation of Classical Mechanics* is a good starting point for thinking about these issues. As a pointed example, in this blog post I’ll look at how badly the Legendre transform is taught in standard textbooks,^{a } and compare it to how it *could* be taught. In a subsequent post, I’ll used this as a springboard for complaining about the way we record and transmit physics knowledge.

Before we begin: turn away from the screen and see if you can remember what the Legendre transform accomplishes *mathematically* in classical mechanics.^{b } I don’t just mean that the Legendre transform converts the Lagrangian into the Hamiltonian and vice versa, but rather: what key mathematical/geometric property does the Legendre transform have, compared to the cornucopia of other function transforms, that allows it to connect these two conceptually distinct formulations of mechanics?

(Analogously, the question “What is useful about the Fourier transform for understanding translationally invariant systems?” can be answered by something like “Translationally invariant operations in the spatial domain correspond to multiplication in the Fourier domain” or “The Fourier transform is a change of basis, within the vector space of functions, using translationally invariant basis elements, i.e., the Fourier modes”.)

#### The status quo

Let’s turn to the canonical text by Goldstein for an example of how the Legendre transform is usually introduced. After a passable explanation of why one might want to move from a second-order equation of variables to a first-order equation of variables, we hear this [3rd edition, page 335]:

Treated strictly as a mathematical problem, the transition from Lagrangian to Hamiltonian formulation corresponds to the changing of variables in our mechanical functions from to by [ ]. The procedure for switching variables in this manner is provided by the Legendre transformation, which is tailored for just this type of change of variable.

Consider a function of only two variables , so that a differential of has the form

(8.3)

where

(8.4)

We wish now to change the basis of description from to a new distinct set of variables , so that differential quantities are expressed in terms of differentials and Let be a function of and defined by the equation

(8.5)

A differential of is then given as

or, by (8.3), as

which is exactly in the form desired. The quantities and are now functions of the variables and given by the relations

(8.6)

which are analogues of Eqs. (8.4).

The Legendre transformation so defined is used frequently in thermodynamics. The first law of thermodynamics…

Huh? Did you see an clean definition of a *function transform* in there? is supposed to be a function of and , but the right-hand side of (8.5) has dependence. Can we always find a way to eliminate for arbitrary ? What does it mean when we can’t, or there are multiple solutions? And in what sense can become a variable *independent* of if its definition, , depends on ? Contrast this to the Fourier, special conformal, or Laplace transforms, which are unambiguous ways to convert a function of one variable to a function of another.

If you can reconstruct a clean definition using this quote, it will be ugly and you will do so by implicitly drawing on your previously obtained knowledge of when one can and cannot treat variables as independent (knowledge that is not accessible to the student reader) and by making assumptions that are true for physical Lagrangians but not true generally (surprise! has to be convex in ). And the motivation for the definition — beyond merely “look at how pretty Hamilton’s equations turn out to be” — will still be opaque.

It would be bad enough if this was just Goldstein because that book is, to my knowledge, the most widely used mechanics textbook, presumably representing the level of clarity achieved by the modal physicist. But I sat down in the library where the classical mechanics books are kept and flipped through seven or eight more^{c } and they were as bad or worse. The venerable textbook by Landau, for instance, uses the same ambiguous differential notation and declines to explain what the Legendre transform is in general; rather, it just declares a formula for the Hamiltonian in terms of the Lagrangian [Vol. 1, 3rd edition, page 131]:

(40.2)

Notice how the functional parameters are written for the Hamiltonian but not the Lagrangian? It is an impressive sleight of hand designed to distract you from the weird fact that this definition implicitly requires inverting the equation by solving for in terms of , , and , and then inserting back into Eq. (40.2).

Indeed, serious ambiguities arise when you start trying to literally interpret a quantity with differentials in terms of different variables, some of which are independent and some of which are not. (Remember, when starting from a Lagrangian defined on space, is generically a function of *both* and .^{d }) And why do we use (40.2) rather than, say, ? Landau doesn’t say, but he does derive Hamilton’s equations a couple of lines later and it’s clear we wouldn’t get the proper cancelation of differentials with a different choice. In other words…look at how pretty Hamilton’s equations are and stop asking questions!

At this point, the typical advice given to the impudently inquisitive student is to look at V.I. Arnold’s *Mathematical Methods of Classical Mechanics*, often described as the definitive, mathematically rigorous treatment…that few bother to read carefully. Here, at least, we find an actual definition in generality [2nd edition, page 61]^{e }:

Let be a convex function, .

The

Legendre transformationof the function is a new function of a new variable , which is constructed in the following way (Figure 43). We draw the graph of in the plane. Let be a given number. Consider the straight line . We take the point at which the curve is farthest from the straight line in the vertical direction: for each the function has a maximum with respect to at the point . Now we define .

The point is defined by the extremal condition , i.e., . Since is convex, the point is unique [if it exists].

If we were to condense this prescription down, we’d get this rather ugly definition for the Legendre transform of a convex function :

(1)

The convexity specified by Arnold guarantees that is well-defined, so at long last we have a clear definition. But the meaning of the transform, and especially the fact that it is its own inverse (i.e., an involution), are completely obscured.

Stare at that figure for a while. Remember: The Legendre transformation links Lagrangian and Hamiltonian mechanics, the two most important formulations of both classical *and* quantum physics. **This transformation binds together the fundamental operating system of the universe, on which all the other physical theories, like electromagnetism and gravity, run merely as programs.**^{f } Do you feel like you understand it? Have you reduced the transformation to its essence?

#### The alternative

Let’s try one more definition, which I originally noticed buried near the bottom of a Wikipedia page. **Two convex functions and are Legendre transforms of each other when their first derivatives are inverse functions**:

*what*? That’s it?!

To confirm that this is equivalent to the above definitions, we solve for (up to an additive constant) by taking the inverse function and computing its anti-derivative:

The graph of an inverse function is just the graph of the original function flipped about the 45° line, so the integral of an inverse function (area below the curve) plus the integral of the original function (area left of the curve) must equal the bounding rectangle (see figure):

(2)

(This is just an oblique use of integration by parts, which is discussed more ^{g })

The above equation holds for each pair satisfying , so we can explicitly confirm that

(3)

It is also clear from looking at a graph that a function inverse is symmetric (i.e., ), so our boxed definition makes it manifest that the Legendre transform is an involution on the set of convex (and concave) functions. In Arnold’s Figure 43 this symmetry is, shall we say, less obvious.^{h }

Hold on, you might object, the above boxed definition only fixes the Legendre transformation up to an additive constant, in contrast to Goldstein, Landau, Arnold, and Eq. (2). How will we determine the absolute values of the Hamiltonian and the Lagrangian? …oh wait, *those aren’t physically meaningful.* All of the dynamical laws are constructed from derivatives of and , and we decline to specify an additive constant for the same reason we do so with conservative potentials^{i } and, more generally, anti-derivatives.

#### The transform as used in mechanics

Lets apply our new characterization of the Legendre transform within mechanics to compare the different approaches. We switch between the Lagrangian and Hamiltonian formulations by changing just the “kinetic” variable (i.e., swapping velocity and momentum), while keeping the configuration coordinates fixed. So let’s introduce functional notation and with for taking the derivative and inverse with respect to just the first (configuration) or second (kinetic) variable, for fixed value of the other.^{j } Then the Legendre transform of the kinetic variable determines the gradient in that direction,

^{k }

*almost*manifest here; this last boxed equation is equivalent to because and are inverse functions on .

^{l }

The two boxed equations above define the relationship between the Lagrangian and Hamiltonian through the Legendre transform of the kinetic coordinates. **The first box says the gradients of and in the kinetic direction are inverses of each other. The second box says the gradients in the configuration direction are negatives of each other once you account for the change of kinetic variables.**

Now, after you understand this, it’s of course perfectly fine to use the mnemonic , along with and , to remember how to quickly compute the Hamiltonian from the Lagrangian, or vice versa. But those cartoon equations suppress all the important structure that tells you what is actually going on, and the equal footing of and merely gestures at the symmetry of the transformation. In particular, the fact that you can uniquely solve for either function in terms of their respective pair of independent variables is guaranteed by the tacit assumption of convexity. More philosophically, only the gradients of these functions, not their actual values, are physically meaningful.

OK, so that’s basically all we need to know about the transform to discuss the terribleness of physics textbook in the forthcoming blog post. In this next and final section, I’ll briefly generalize the transform a bit for added context, but consider it optional reading.

#### Multiple variables and non-convex functions

The Legendre transforms can be extended to multivariate functions and without any surprises. One clear definition is

(4)

where the inverse function exists because is a bijection on when we assume (as we must) that is convex. In the approach advocated above, this is restated more elegantly as

where, again, the involutivity is manifest. For switching between Lagrangians and Hamiltonians, we use the defining conditions

(5)

where the subscripts and now refer respectively to groups of configuration and kinetic variables in the natural way. The interpretation is the same: the kinetic gradients are inverses, and the configuration gradients are negatives.

One can generalize (4) to non-convex functions as

(6)

where the supremum is of course just the formal way of talking about a maximum over all choices of . When is convex and smooth, the right-hand side is maximized under the condition that , so we recover (4). But more generally, this formula ensures the transform is well defined for any , and the output is convex regardless.^{m } Basically, we’ve defined a way of breaking the ambiguity that comes when doesn’t have a unique inverse, and it turns out that this judicious choice ensures is the Legendre transform of the convex hull of ! In other words, applying the Legendre transform *twice* acts as the identify on convex functions but it produces the convex hull on non-convex ones.^{n }

*[I think Robert Lasenby and Godfrey Miller for discussion.]*

### Footnotes

(↵ returns to text)

- I was pleased to note as this essay went to press that my choice of Landau, Goldstein, and Arnold were confirmed as the “standard” suggestions by the top Google results.↵
- If not, can you remember the definition? I couldn’t, a month ago.↵
- Landau & Lifshitz, Hand & Finch, Rossberg, plus a bunch I hadn’t heard of before.↵
- To keep my blood pressure in check, I’m just going to skip completely over the fact that this Hamiltonian has dependence. Almost all physics textbook fail to clearly explain to the student why we can nevertheless indulge in the sin of pretending that is an independent variable from , attributable perhaps to the mystery of faith. For clarity on this, I suggest picking up Gelfand and Fomin’s “Calculus of variations”, which is well regarded and available on Amazon for $9.↵
- A similar precise but opaque definition is given in the text by Jose and Saletan, which was inflicted upon me in graduate school. Figures like the one below can be found in Hand & Finch, and in various notes floating around that promise an “easy introduction” to the Legendre transform.↵
- Scott Aaronson, PHYS771 Lecture 9: Quantum: ‘So, what is quantum mechanics? Even though it was discovered by physicists, it’s not a physical theory in the same sense as electromagnetism or general relativity. In the usual “hierarchy of sciences” — with biology at the top, then chemistry, then physics, then math — quantum mechanics sits at a level between math and physics that I don’t know a good name for. Basically, quantum mechanics is the operating system that other physical theories run on as application software (with the exception of general relativity, which hasn’t yet been successfully ported to this particular OS). There’s even a word for taking a physical theory and porting it to this OS: “to quantize”.’
Note, by “the level between physics and math”, Aaronson is talking about the basic information theoretic underpinning of quantum mechanics, whose classical analog is probability theory (which, classically, is generally assumed without discussion). In this post I am talking about the abstract formalism of Lagrangian/Hamiltonian mechanics, which sits above quantum/classical information theory but still below (i.e., more fundamental than) particular physical theories like electromagnetism or gravity.↵

- Thanks to Godfrey Miller for this link.↵
- See also Arnold’s Figure 45, where he tries (in my opinion, hopelessly) to make the involution property intuitive.↵
- By the way, you weren’t fooled by the Aharanov-Bohm effect into thinking that the potential is more “real” in quantum mechanics, were you?↵
- Here I’m adopting something close to the functional notation of Sussman and Wisdom.↵
- We actually only need this equation to hold on a single value of the kinetic variable since the previous boxed equation automatically extends it to all values. What’s happening here is this: although the additive offset we choose when transforming the kinetic variable () is arbitrary, we need to choose the
*same*offset for different values of the configuration coordinate () since the configuration gradient is physically important.↵ - At a more advanced level this is known to map between the tangent and cotangent bundles, so the domain and range are isomorphic but not strictly the same.↵
- Indeed, generalizing the Legendre transform to non-convex functions, where it is usually known as the convex conjugate, is a justification for using a definition like Eq. (1) rather than . See here for more, as well as a nice description in terms of hyperplanes.↵
- I thank Kristan Temme for emphasizing this non-mechanical aspect of the Legendre transform to me.↵

Thank you for this post!

Making use of slightly more abstract mathematical concepts to truly understand stuff that you normally you are taught only in a “practical” way is one of the most satisfying feeling I have while studying physics.

Thank you for this! Poor explanations survive because nobody wants to admit they found the greats (L&L, Goldstein…) confusing!

I remember seeing an picture like yours (areas under and to the left of the curve) in Hilbert & Courant’s “Mathematical methods of Physics”, probably the chapter on the calculus of variation

I’m looking at Volume 1 of the 1989 edition and I can’t find a picture. The discussion of the Legendre transform is in Section IV.9 (p. 231 – 242).

This is Tropical Mathematics!

See e.g. Gabriel Peyré: “Fourier is to convolution what Legendre is to

inf-convolution.” (https://twitter.com/gabrielpeyre/status/918529877829185542)

See also Grigory L. Litvinov: “Tropical Mathematics, Idempotent

Analysis, Classical Mechanics and Geometry” (https://arxiv.org/abs/1005.1247).

I don’t think you’re being quite fair to Arnold. In particular, the fact that the Legendre transform defined in §14A is an involution is explicitly noted and proved in the immediately following §14B (I am using the 3rd Russian edition, 1988, but paragraph indexing ought to be edition- and translation-consistent). Further on, if the student is intrepid enough to not be deterred by exterior calculus and symplectic manifolds, Arnold discusses the Lagrange conjugates in terms of tangent and cotangent bundles in the very informative §§37-38, and the origins of the Legendre transform in optics in §46. As for the first derivatives approach to Legendre transform, I feel that it might be related to the 1-form formalism of Hamiltonian mechanics (§44) which explicitly dispenses with the non-physical additive constant in the Hamiltonian and, as a bonus, facilitates the use of non-canonical transformations in perturbation theory (Littlejohn, 1982).

*Legendre conjugates

Thanks for the comments, Anton Tykhyi!

> …the fact that the Legendre transform defined in §14A is an involution is explicitly noted and proved in the immediately following §14B

Sure, he is forced to spend an

entire pageproving this because he must work with his clumsy geometric definition of the transform. (It is §14C in the 2nd English edition because §14B contains examples.) This is why I said “…the fact that it is its own inverse (i.e., an involution) [is] completely obscured” (and mentioned in my footnotehthat he demonstrates the involution property) instead of saying “…is left unproven”. Indeed, students already know from elementary sources like Goldstein that this is an involution….they just can’t remember exactly why.Imagine someone was teaching relativistic field theory but used notation and definitions that completely obscured the fact that the forms of the Lagrangians are Lorentz invariant. Suppose it took them a full page to prove invariance for a single example, perhaps because they have to break their proof into different parts, depending on whether it’s a rotation or a boost. This would be

terrible.I didn’t pick Arnold because I thought he did a relatively bad job, I picked him because he did the

bestjob of any single human! But no single human is good enough, and we need better ways to allow others humans to contribute so we can build super-human documents.> …if the student is intrepid enough to not be deterred by exterior calculus and symplectic manifolds, Arnold discusses the Lagrange conjugates in terms of tangent and cotangent bundles in the very informative §§37-38

Yes, some 150 dense pages later! Why should the student need to understand the exterior calculus to understand that Legendre/Young duals are simply defined by having inverse derivatives?!

As I have emphasized in many places, the criticism isn’t that this information isn’t available somewhere. It’s that the pedagogical material is so blatantly flawed yet remains unimproved decade after decade. Likewise, Wikipedia is stupendously useful even though almost all the information it contains is accessible, in principle, elsewhere.

This cashes out in the fact that very few physicists know anything about the Legendre transform except perhaps cookbook recipes provided by Goldstein/Landau/whoever. And many cannot even remember that. Just as “justice delayed is justice denied”, physics knowledge obscured is (effectively) physics knowledge unknown.

(Edited to fix HTML.)> It is §14C in the 2nd English edition because §14B contains examples.

My pardon – that was §14B with the Cyrillic B, with §14Б containing problems in small type intervening. §14B is on the same page as §14A so I wrote “immediately”.

> It’s that the pedagogical material is so blatantly flawed yet remains unimproved decade after decade.

Biologists have it a little better – e.g. up-to-date editions of the “Molecular Biology of the Cell” seem to be issued fairly regularly – but it’s partly forced on them by the peculiarities of their discipline, which relies so much on descriptive and empirical results as against theoretical models. I agree that in XXI century we might be technically able to have better pedagogical material in physics, and it might be a very good thing to have it, too, even though the vast majority of working physicists (except a handful of people at the Perimeter and a few similar places who work at the bleeding edge of hep-th/gr-qc) have no need to know anything about e.g. the Legendre transform, except perhaps as a cookbook recipe 10^{-3} times pppy. This means there is little outside incentive to improve the pedagogical materials much (Landau&Lifshitz wrote their textbook to teach the people who would be designing and building Soviet nuclear bombs and other weapons! (Rhodes, 1995)), and as a result the textbooks are full of outdated knowledge and use hoary formalisms (I mentioned exterior calculus above – its basics are not that complicated, the statement of Stokes’ theorem is trivial, and it is so much handier to use than partial derivatives with indices and Levi-Civita symbols) on the one hand, and often don’t give the motivation behind this or that theory etc. on the other – why is the line of nodes in Euler’s angles called that? And please don’t get me started on schoolbooks.

As for the infrastructure for composing better textbooks, adequate tools have been developed and are in extremely active use in the software community*. It’s so great an improvement on methods in use just 10-15 years ago that even Microsoft has moved most of their projects to GitHub (Windows source is too big and active and needs custom servers, but e.g. .NET Framework/Core with compilers, libraries and miscellaneous paraphernalia is online). I’ve read your GitWikXiv posts and most of your wish-list is implemented and used in one form or another. For instance, what you call attribution (GWX#2) is called ‘blame’ in software, and is enormously useful; additional information can be entered in commit messages, commits can have multiple authors, and if authors’ contributions are separate they should be committed separately. git does have a fairly steep learning curve, but it is a very powerful tool, and there are a lot of wrappers that simplify common tasks like pull requests (not much more difficult than editing Wikipedia btw) and merging. Using these tools for creating and publishing scientific material should be an obvious idea, though they might need additions and customizations for convenience. Imagine how nice it would be if a paper was a (tag in a) standalone repository (including code and raw data if not available elsewhere and not HUGE, in which cases adequate resource locators (possibly including data hashes) should be specified) that one could clone, run “make”, have the computer download all necessary packages and stuff, compile code, perform calculations and get the same pdf result as published. Journals or publishers could accept papers as repos, review and editing could move to private branches etc. I confess I don’t quite understand your wish for a visual (La)TeX/successor editor, but then I do physics as a sort of a legacy project, and as a software developer I am used to editing source. Perhaps with the proliferation of skinny 16:9 screens and powerful CPUs a split view – editable source on left, navigable pdf on right – would work.

As for acceptance by the wider community of any new tools and methods if developed, that would probably depend on hep-th/gr-qc community as the one with the most historical prestige – a sort of aristocracy of physics. Didn’t it create ArXiv? Now that arXiv is quite well-established and even spreading to other disciplines like biology, it’s time for something new. There being few formal or material incentives, informal incentives and pure noblesse oblige would have to be it. (The same goes for textbooks.) I’m sure enough software developers could be found to work/help on such projects – many of us are so cracked that we code for money at work and then code for pleasure in our spare time!

* I don’t know how much do you use these tools yourself, please excuse me if I repeat what you already know or sound preachy.

I learned something about the Russian alphabet today 🙂

> Biologists have it a little better…but it’s partly forced on them by the peculiarities of their discipline, which relies so much on descriptive and empirical results as against theoretical models.

Agreed. Even within physics, “The Review of Particle Physics” is a somewhat similar document that exists because of the huge and rapid flow of data out of particle physics experiments. It seems to be much harder (perhaps unsurprisingly) to collaborate on foundational theory than on collating empirical results.

> This means there is little outside incentive to improve the pedagogical materials much

Yep, I agree with your assessment 🙁

> As for the infrastructure for composing better textbooks, adequate tools have been developed and are in extremely active use in the software community…most of your wish-list is implemented and used in one form or another.

Oh yes, I agree that almost all the basic tools are implemented somewhere in some form. The remaining barriers seem to be: (1) Tools are too painful/confusing to use; (2) Researchers are insufficiently incentivized to use them; (3) Researchers don’t know the tools exist; (4) Chicken-or-egg, i.e., the tools aren’t useful until other people are using them; (5) Copyright restrictions. These barriers trade-off against each other to some extent, e.g., researchers might start using these tools if

eitherthey get more incentiveorif it was less painful to use. But in general, the existence of tools does not mean we don’t need more and better tool development. After all, there were dozens of cloud storage providers that existed before Dropbox; the technical improvements that Dropbox had over its competitors were in some sense trivial issues of UI and error avoidance. Yet this made a difference of millions of users and billions of dollars.To some extent, this is unfair to tool developers; why should these free tools have to be

perfect? But on the other hand, we are all trapped in a bad equilibrium of a big collective action problem. It seemsmucheasier to build nicer tools (something that can be accomplished by just a handful of developers) than it is to change the incentive structure of academia.> For instance, what you call attribution (GWX#2) is called ‘blame’ in software, and is enormously useful; additional information can be entered in commit messages, commits can have multiple authors, and if authors’ contributions are separate they should be committed separately.

Yes, but this needs to be integrated into the arXiv and ORCID (the academic standard for author identification). Just like the original arXiv faced a chicken-or-egg problem with respect to scientific priority (i.e., that posting to the arXiv “counted” with respect to claiming originality, a norm that did not develop until long after the arXiv was thriving), an academic attribution-tracking system would need to be trusted as “authoritative”.

> I confess I don’t quite understand your wish for a visual (La)TeX/successor editor, but then I do physics as a sort of a legacy project, and as a software developer I am used to editing source. Perhaps with the proliferation of skinny 16:9 screens and powerful CPUs a split view – editable source on left, navigable pdf on right – would work.

No, editing source is fine. Most physicists are quite comfortable with it, and we indeed use the split-screen set ups you describe. The issue is just that LaTeX is crappy (e.g., a proliferation of incompatible packages, mysterious error messages, etc.). Even the simplest things, like creating a document with three columns, requires reading through multiple forums and choosing packages from unknown authors which will interact in unpredictable ways with future packages.

This issue is pretty separated from the rest of the discussion, though. I don’t think it’s necessary to create a LaTeX successor to accomplish everything else.

> As for acceptance by the wider community of any new tools and methods if developed, that would probably depend on hep-th/gr-qc community …Didn’t it create ArXiv?

Yep, hep-th. (Although it wasn’t called the hep-th community before the ArXiv!)

> I’m sure enough software developers could be found to work/help on such projects – many of us are so cracked that we code for money at work and then code for pleasure in our spare time!

Agreed. The willingness among software developers to contribute to public project with no hope of direct material rewards is, as far as I can tell, unparalleled in any other discipline.