The way that most physicists teach and talk about partial differential equations is horrible, and has surprisingly big costs for the typical understanding of the foundations of the field even among professionals. The chief victims are students of thermodynamics and analytical mechanics, and I’ve mentioned before that the preface of Sussman and Wisdom’s *Structure and Interpretation of Classical Mechanics* is a good starting point for thinking about these issues. As a pointed example, in this blog post I’ll look at how badly the Legendre transform is taught in standard textbooks,^{a } and compare it to how it *could* be taught. In a subsequent post, I’ll used this as a springboard for complaining about the way we record and transmit physics knowledge.

Before we begin: turn away from the screen and see if you can remember what the Legendre transform accomplishes *mathematically* in classical mechanics.^{b } I don’t just mean that the Legendre transform converts the Lagrangian into the Hamiltonian and vice versa, but rather: what key mathematical/geometric property does the Legendre transform have, compared to the cornucopia of other function transforms, that allows it to connect these two conceptually distinct formulations of mechanics?

(Analogously, the question “What is useful about the Fourier transform for understanding translationally invariant systems?” can be answered by something like “Translationally invariant operations in the spatial domain correspond to multiplication in the Fourier domain” or “The Fourier transform is a change of basis, within the vector space of functions, using translationally invariant basis elements, i.e., the Fourier modes”.)

#### The status quo

Let’s turn to the canonical text by Goldstein for an example of how the Legendre transform is usually introduced. After a passable explanation of why one might want to move from a second-order equation of variables to a first-order equation of variables, we hear this [3rd edition, page 335]:

Treated strictly as a mathematical problem, the transition from Lagrangian to Hamiltonian formulation corresponds to the changing of variables in our mechanical functions from to by [ ]. The procedure for switching variables in this manner is provided by the Legendre transformation, which is tailored for just this type of change of variable.

Consider a function of only two variables , so that a differential of has the form

(8.3)

where

(8.4)

We wish now to change the basis of description from to a new distinct set of variables , so that differential quantities are expressed in terms of differentials and Let be a function of and defined by the equation

(8.5)

A differential of is then given as

or, by (8.3), as

which is exactly in the form desired. The quantities and are now functions of the variables and given by the relations

(8.6)

which are analogues of Eqs. (8.4).

The Legendre transformation so defined is used frequently in thermodynamics. The first law of thermodynamics…

Huh? Did you see an clean definition of a *function transform* in there? is supposed to be a function of and , but the right-hand side of (8.5) has dependence. Can we always find a way to eliminate for arbitrary ? What does it mean when we can’t, or there are multiple solutions? And in what sense can become a variable *independent* of if its definition, , depends on ? Contrast this to the Fourier, special conformal, or Laplace transforms, which are unambiguous ways to convert a function of one variable to a function of another.

If you can reconstruct a clean definition using this quote, it will be ugly and you will do so by implicitly drawing on your previously obtained knowledge of when one can and cannot treat variables as independent (knowledge that is not accessible to the student reader) and by making assumptions that are true for physical Lagrangians but not true generally (surprise! has to be convex in ). And the motivation for the definition — beyond merely “look at how pretty Hamilton’s equations turn out to be” — will still be opaque.

It would be bad enough if this was just Goldstein because that book is, to my knowledge, the most widely used mechanics textbook, presumably representing the level of clarity achieved by the modal physicist. But I sat down in the library where the classical mechanics books are kept and flipped through seven or eight more^{c } and they were as bad or worse. The venerable textbook by Landau, for instance, uses the same ambiguous differential notation and declines to explain what the Legendre transform is in general; rather, it just declares a formula for the Hamiltonian in terms of the Lagrangian [Vol. 1, 3rd edition, page 131]:

(40.2)

Notice how the functional parameters are written for the Hamiltonian but not the Lagrangian? It is an impressive sleight of hand designed to distract you from the weird fact that this definition implicitly requires inverting the equation by solving for in terms of , , and , and then inserting back into Eq. (40.2).

Indeed, serious ambiguities arise when you start trying to literally interpret a quantity with differentials in terms of different variables, some of which are independent and some of which are not. (Remember, when starting from a Lagrangian defined on space, is generically a function of *both* and .^{d }) And why do we use (40.2) rather than, say, ? Landau doesn’t say, but he does derive Hamilton’s equations a couple of lines later and it’s clear we wouldn’t get the proper cancelation of differentials with a different choice. In other words…look at how pretty Hamilton’s equations are and stop asking questions!

At this point, the typical advice given to the impudently inquisitive student is to look at V.I. Arnold’s *Mathematical Methods of Classical Mechanics*, often described as the definitive, mathematically rigorous treatment…that few bother to read carefully. Here, at least, we find an actual definition in generality [2nd edition, page 61]^{e }:

Let be a convex function, .

The

Legendre transformationof the function is a new function of a new variable , which is constructed in the following way (Figure 43). We draw the graph of in the plane. Let be a given number. Consider the straight line . We take the point at which the curve is farthest from the straight line in the vertical direction: for each the function has a maximum with respect to at the point . Now we define .

The point is defined by the extremal condition , i.e., . Since is convex, the point is unique [if it exists].

If we were to condense this prescription down, we’d get this rather ugly definition for the Legendre transform of a convex function :

(1)

The convexity specified by Arnold guarantees that is well-defined, so at long last we have a clear definition. But the meaning of the transform, and especially the fact that it is its own inverse (i.e., an involution), are completely obscured.

Stare at that figure for a while. Remember: The Legendre transformation links Lagrangian and Hamiltonian mechanics, the two most important formulations of both classical *and* quantum physics. **This transformation binds together the fundamental operating system of the universe, on which all the other physical theories, like electromagnetism and gravity, run merely as programs.**^{f } Do you feel like you understand it? Have you reduced the transformation to its essence?

#### The alternative

Let’s try one more definition, which I originally noticed buried near the bottom of a Wikipedia page. **Two convex functions and are Legendre transforms of each other when their first derivatives are inverse functions**:

*what*? That’s it?!

To confirm that this is equivalent to the above definitions, we solve for (up to an additive constant) by taking the inverse function and computing its anti-derivative:

The graph of an inverse function is just the graph of the original function flipped about the 45° line, so the integral of an inverse function (area below the curve) plus the integral of the original function (area left of the curve) must equal the bounding rectangle (see figure):

(2)

(This is just an oblique use of integration by parts, which is discussed more ^{g })

The above equation holds for each pair satisfying , so we can explicitly confirm that

(3)

It is also clear from looking at a graph that a function inverse is symmetric (i.e., ), so our boxed definition makes it manifest that the Legendre transform is an involution on the set of convex (and concave) functions. In Arnold’s Figure 43 this symmetry is, shall we say, less obvious.^{h }

Hold on, you might object, the above boxed definition only fixes the Legendre transformation up to an additive constant, in contrast to Goldstein, Landau, Arnold, and Eq. (2). How will we determine the absolute values of the Hamiltonian and the Lagrangian? …oh wait, *those aren’t physically meaningful.* All of the dynamical laws are constructed from derivatives of and , and we decline to specify an additive constant for the same reason we do so with conservative potentials^{i } and, more generally, anti-derivatives.

#### The transform as used in mechanics

Lets apply our new characterization of the Legendre transform within mechanics to compare the different approaches. We switch between the Lagrangian and Hamiltonian formulations by changing just the “kinetic” variable (i.e., swapping velocity and momentum), while keeping the configuration coordinates fixed. So let’s introduce functional notation and with for taking the derivative and inverse with respect to just the first (configuration) or second (kinetic) variable, for fixed value of the other.^{j } Then the Legendre transform of the kinetic variable determines the gradient in that direction,

^{k }

*almost*manifest here; this last boxed equation is equivalent to because and are inverse functions on .

^{l }

The two boxed equations above define the relationship between the Lagrangian and Hamiltonian through the Legendre transform of the kinetic coordinates. **The first box says the gradients of and in the kinetic direction are inverses of each other. The second box says the gradients in the configuration direction are negatives of each other once you account for the change of kinetic variables.**

Now, after you understand this, it’s of course perfectly fine to use the mnemonic , along with and , to remember how to quickly compute the Hamiltonian from the Lagrangian, or vice versa. But those cartoon equations suppress all the important structure that tells you what is actually going on, and the equal footing of and merely gestures at the symmetry of the transformation. In particular, the fact that you can uniquely solve for either function in terms of their respective pair of independent variables is guaranteed by the tacit assumption of convexity. More philosophically, only the gradients of these functions, not their actual values, are physically meaningful.

OK, so that’s basically all we need to know about the transform to discuss the terribleness of physics textbook in the forthcoming blog post. In this next and final section, I’ll briefly generalize the transform a bit for added context, but consider it optional reading.

#### Multiple variables and non-convex functions

The Legendre transforms can be extended to multivariate functions and without any surprises. One clear definition is

(4)

where the inverse function exists because is a bijection on when we assume (as we must) that is convex. In the approach advocated above, this is restated more elegantly as

where, again, the involutivity is manifest. For switching between Lagrangians and Hamiltonians, we use the defining conditions

(5)

where the subscripts and now refer respectively to groups of configuration and kinetic variables in the natural way. The interpretation is the same: the kinetic gradients are inverses, and the configuration gradients are negatives.

One can generalize (4) to non-convex functions as

(6)

where the supremum is of course just the formal way of talking about a maximum over all choices of . When is convex and smooth, the right-hand side is maximized under the condition that , so we recover (4). But more generally, this formula ensures the transform is well defined for any , and the output is convex regardless.^{m } Basically, we’ve defined a way of breaking the ambiguity that comes when doesn’t have a unique inverse, and it turns out that this judicious choice ensures is the Legendre transform of the convex hull of ! In other words, applying the Legendre transform *twice* acts as the identify on convex functions but it produces the convex hull on non-convex ones.^{n }

*[I think Robert Lasenby and Godfrey Miller for discussion.]*

### Footnotes

(↵ returns to text)

- I was pleased to note as this essay went to press that my choice of Landau, Goldstein, and Arnold were confirmed as the “standard” suggestions by the top Google results.↵
- If not, can you remember the definition? I couldn’t, a month ago.↵
- Landau & Lifshitz, Hand & Finch, Rossberg, plus a bunch I hadn’t heard of before.↵
- To keep my blood pressure in check, I’m just going to skip completely over the fact that this Hamiltonian has dependence. Almost all physics textbook fail to clearly explain to the student why we can nevertheless indulge in the sin of pretending that is an independent variable from , attributable perhaps to the mystery of faith. For clarity on this, I suggest picking up Gelfand and Fomin’s “Calculus of variations”, which is well regarded and available on Amazon for $9.↵
- A similar precise but opaque definition is given in the text by Jose and Saletan, which was inflicted upon me in graduate school. Figures like the one below can be found in Hand & Finch, and in various notes floating around that promise an “easy introduction” to the Legendre transform.↵
- Scott Aaronson, PHYS771 Lecture 9: Quantum: ‘So, what is quantum mechanics? Even though it was discovered by physicists, it’s not a physical theory in the same sense as electromagnetism or general relativity. In the usual “hierarchy of sciences” — with biology at the top, then chemistry, then physics, then math — quantum mechanics sits at a level between math and physics that I don’t know a good name for. Basically, quantum mechanics is the operating system that other physical theories run on as application software (with the exception of general relativity, which hasn’t yet been successfully ported to this particular OS). There’s even a word for taking a physical theory and porting it to this OS: “to quantize”.’
Note, by “the level between physics and math”, Aaronson is talking about the basic information theoretic underpinning of quantum mechanics, whose classical analog is probability theory (which, classically, is generally assumed without discussion). In this post I am talking about the abstract formalism of Lagrangian/Hamiltonian mechanics, which sits above quantum/classical information theory but still below (i.e., more fundamental than) particular physical theories like electromagnetism or gravity.↵

- Thanks to Godfrey Miller for this link.↵
- See also Arnold’s Figure 45, where he tries (in my opinion, hopelessly) to make the involution property intuitive.↵
- By the way, you weren’t fooled by the Aharanov-Bohm effect into thinking that the potential is more “real” in quantum mechanics, were you?↵
- Here I’m adopting something close to the functional notation of Sussman and Wisdom.↵
- We actually only need this equation to hold on a single value of the kinetic variable since the previous boxed equation automatically extends it to all values. What’s happening here is this: although the additive offset we choose when transforming the kinetic variable () is arbitrary, we need to choose the
*same*offset for different values of the configuration coordinate () since the configuration gradient is physically important.↵ - At a more advanced level this is known to map between the tangent and cotangent bundles, so the domain and range are isomorphic but not strictly the same.↵
- Indeed, generalizing the Legendre transform to non-convex functions, where it is usually known as the convex conjugate, is a justification for using a definition like Eq. (1) rather than . See here for more, as well as a nice description in terms of hyperplanes.↵
- I thank Kristan Temme for emphasizing this non-mechanical aspect of the Legendre transform to me.↵

Thank you for this post!

Making use of slightly more abstract mathematical concepts to truly understand stuff that you normally you are taught only in a “practical” way is one of the most satisfying feeling I have while studying physics.

Thank you for this! Poor explanations survive because nobody wants to admit they found the greats (L&L, Goldstein…) confusing!

I remember seeing an picture like yours (areas under and to the left of the curve) in Hilbert & Courant’s “Mathematical methods of Physics”, probably the chapter on the calculus of variation

I’m looking at Volume 1 of the 1989 edition and I can’t find a picture. The discussion of the Legendre transform is in Section IV.9 (p. 231 – 242).