Legendre transform

The way that most physicists teach and talk about partial differential equations is horrible, and has surprisingly big costs for the typical understanding of the foundations of the field even among professionals. The chief victims are students of thermodynamics and analytical mechanics, and I’ve mentioned before that the preface of Sussman and Wisdom’s Structure and Interpretation of Classical Mechanics is a good starting point for thinking about these issues. As a pointed example, in this blog post I’ll look at how badly the Legendre transform is taught in standard textbooks,I was pleased to note as this essay went to press that my choice of Landau, Goldstein, and Arnold were confirmed as the “standard” suggestions by the top Google results. a   and compare it to how it could be taught. In a subsequent post, I’ll used this as a springboard for complaining about the way we record and transmit physics knowledge.

Before we begin: turn away from the screen and see if you can remember what the Legendre transform accomplishes mathematically in classical mechanics.If not, can you remember the definition? I couldn’t, a month ago. b   I don’t just mean that the Legendre transform converts the Lagrangian into the Hamiltonian and vice versa, but rather: what key mathematical/geometric property does the Legendre transform have, compared to the cornucopia of other function transforms, that allows it to connect these two conceptually distinct formulations of mechanics?

(Analogously, the question “What is useful about the Fourier transform for understanding translationally invariant systems?” can be answered by something like “Translationally invariant operations in the spatial domain correspond to multiplication in the Fourier domain” or “The Fourier transform is a change of basis, within the vector space of functions, using translationally invariant basis elements, i.e., the Fourier modes”.)

The status quo

Let’s turn to the canonical text by Goldstein for an example of how the Legendre transform is usually introduced. After a passable explanation of why one might want to move from a second-order equation of n variables to a first-order equation of 2n variables, we hear this [3rd edition, page 335]:

Treated strictly as a mathematical problem, the transition from Lagrangian to Hamiltonian formulation corresponds to the changing of variables in our mechanical functions from (q,\dot{q},t) to (q,p,t) by [ p = \partial L(q,\dot{q},t)/\partial \dot{q} ]. The procedure for switching variables in this manner is provided by the Legendre transformation, which is tailored for just this type of change of variable.

Consider a function of only two variables f(x,y), so that a differential of f has the form

(8.3)   \begin{align*} df = u dx + vdy \end{align*}

where

(8.4)   \begin{align*} u = \frac{\partial f}{\partial x},\qquad v = \frac{\partial f}{\partial y}. \end{align*}

We wish now to change the basis of description from x,y to a new distinct set of variables u,y, so that differential quantities are expressed in terms of differentials du and dy\nobreak . Let g be a function of u and y defined by the equation

(8.5)   \begin{align*} g = f-ux. \end{align*}

A differential of g is then given as

    \[dg = df-udx -xdu.\]

or, by (8.3), as

    \[dg = vdy-xdu\]

which is exactly in the form desired. The quantities x and v are now functions of the variables u and y given by the relations

(8.6)   \begin{align*} x = -\frac{\partial g}{\partial u},\qquad v = \frac{\partial g}{\partial y}. \end{align*}

which are analogues of Eqs. (8.4).

The Legendre transformation so defined is used frequently in thermodynamics. The first law of thermodynamics…

Huh? Did you see an clean definition of a function transform in there? g is supposed to be a function of u and y, but the right-hand side of (8.5) has x dependence. Can we always find a way to eliminate x for arbitrary f? What does it mean when we can’t, or there are multiple solutions? And in what sense can u become a variable independent of y if its definition, u = \partial f(x,y)/\partial x, depends on y? Contrast this to the Fourier, special conformal, or Laplace transforms, which are unambiguous ways to convert a function of one variable to a function of another.

If you can reconstruct a clean definition using this quote, it will be ugly and you will do so by implicitly drawing on your previously obtained knowledge of when one can and cannot treat variables as independent (knowledge that is not accessible to the student reader) and by making assumptions that are true for physical Lagrangians but not true generally (surprise! f has to be convex in x). And the motivation for the definition — beyond merely “look at how pretty Hamilton’s equations turn out to be” — will still be opaque.

It would be bad enough if this was just Goldstein because that book is, to my knowledge, the most widely used mechanics textbook, presumably representing the level of clarity achieved by the modal physicist. But I sat down in the library where the classical mechanics books are kept and flipped through seven or eight moreLandau & Lifshitz, Hand & Finch, Rossberg, plus a bunch I hadn’t heard of before. c   and they were as bad or worse. The venerable textbook by Landau, for instance, uses the same ambiguous differential notation and declines to explain what the Legendre transform is in general; rather, it just declares a formula for the Hamiltonian in terms of the Lagrangian [Vol. 1, 3rd edition, page 131]:

(40.2)   \begin{align*} H(p,q,t) = \sum_i p_i\dot{q}_i -L \end{align*}

Notice how the functional parameters are written for the Hamiltonian but not the Lagrangian? It is an impressive sleight of hand designed to distract you from the weird fact that this definition implicitly requires inverting the equation p=\partial L(q,\dot{q},t)/\partial \dot{q} by solving for \dot{q} in terms of q, p, and t, and then inserting back into Eq. (40.2).

Indeed, serious ambiguities arise when you start trying to literally interpret a quantity with differentials in terms of different variables, some of which are independent and some of which are not. (Remember, when starting from a Lagrangian defined on (x,v) space, p is generically a function of both x and v.To keep my blood pressure in check, I’m just going to skip completely over the fact that this Hamiltonian has t dependence. Almost all physics textbook fail to clearly explain to the student why we can nevertheless indulge in the sin of pretending that \dot{q} is an independent variable from q, attributable perhaps to the mystery of faith. For clarity on this, I suggest picking up Gelfand and Fomin’s “Calculus of variations”, which is well regarded and available on Amazon for $9. d  ) And why do we use (40.2) rather than, say, \tilde{H}= 2\sum_i p_i\dot{q}_i +L? Landau doesn’t say, but he does derive Hamilton’s equations a couple of lines later and it’s clear we wouldn’t get the proper cancelation of differentials with a different choice. In other words…look at how pretty Hamilton’s equations are and stop asking questions!

At this point, the typical advice given to the impudently inquisitive student is to look at V.I. Arnold’s Mathematical Methods of Classical Mechanics, often described as the definitive, mathematically rigorous treatment…that few bother to read carefully. Here, at least, we find an actual definition in generality [2nd edition, page 61]A similar precise but opaque definition is given in the text by Jose and Saletan, which was inflicted upon me in graduate school. Figures like the one below can be found in Hand & Finch, and in various notes floating around that promise an “easy introduction” to the Legendre transform. e  :

Let y=f(x) be a convex function, f''(x)>0.

The Legendre transformation of the function f is a new function g of a new variable p, which is constructed in the following way (Figure 43). We draw the graph of f in the x,y plane. Let p be a given number. Consider the straight line y=px. We take the point x=x(p) at which the curve is farthest from the straight line in the vertical direction: for each p the function px - f(x) =F(p,x) has a maximum with respect to x at the point x(p). Now we define g(p) = F(p,x(p)).

The point x(p) is defined by the extremal condition \partial F/\partial x = 0, i.e., f'(x)=p. Since f is convex, the point x(p) is unique [if it exists].



If we were to condense this prescription down, we’d get this rather ugly definition for the Legendre transform g of a convex function f:

(1)   \begin{align*} g(y) = xy - f(x) \big\vert_{x = (f')^{-1}(y)} \end{align*}

The convexity specified by Arnold guarantees that (f')^{-1} is well-defined, so at long last we have a clear definition. But the meaning of the transform, and especially the fact that it is its own inverse (i.e., an involution), are completely obscured.

Stare at that figure for a while. Remember: The Legendre transformation links Lagrangian and Hamiltonian mechanics, the two most important formulations of both classical and quantum physics. This transformation binds together the fundamental operating system of the universe, on which all the other physical theories, like electromagnetism and gravity, run merely as programs.Scott Aaronson, PHYS771 Lecture 9: Quantum: ‘So, what is quantum mechanics? Even though it was discovered by physicists, it’s not a physical theory in the same sense as electromagnetism or general relativity. In the usual “hierarchy of sciences” — with biology at the top, then chemistry, then physics, then math — quantum mechanics sits at a level between math and physics that I don’t know a good name for. Basically, quantum mechanics is the operating system that other physical theories run on as application software (with the exception of general relativity, which hasn’t yet been successfully ported to this particular OS). There’s even a word for taking a physical theory and porting it to this OS: “to quantize.’Note, by “the level between physics and math”, Aaronson is talking about the basic information theoretic underpinning of quantum mechanics, whose classical analog is probability theory (which, classically, is generally assumed without discussion). In this post I am talking about the abstract formalism of Lagrangian/Hamiltonian mechanics, which sits above quantum/classical information theory but still below (i.e., more fundamental than) particular physical theories like electromagnetism or gravity. f   Do you feel like you understand it? Have you reduced the transformation to its essence?

The alternative

Let’s try one more definition, which I originally noticed buried near the bottom of a Wikipedia page. Two convex functions f and g are Legendre transforms of each other when their first derivatives are inverse functions:

g'=(f')^{-1}.
Wait — what? That’s it?!

To confirm that this is equivalent to the above definitions, we solve for g (up to an additive constant) by taking the inverse function (f')^{-1} and computing its anti-derivative:

    \[g(x) = \int_0^x\!d\tilde{x}\, (f')^{-1}(\tilde{x}).\]

The graph of an inverse function is just the graph of the original function flipped about the 45° line, so the integral of an inverse function (area below the curve) plus the integral of the original function (area left of the curve) must equal the bounding rectangle (see figure):

(2)   \begin{align*} xy&=\int_0^x\!d\tilde{x}\, (f')^{-1}(\tilde{x}) + \int_0^y\!d\tilde{y}\, f'(\tilde{y}) \\ &=g(x) + f(y) \end{align*}

(This is just an oblique use of integration by parts, which is discussed more in here.Thanks to Godfrey Miller for this link. g  )



The above equation holds for each pair (x,y) satisfying y=(f')^{-1}(x), so we can explicitly confirm that

(3)   \begin{align*} g(x) &= xy-f(y)\Big\vert_{y = (f')^{-1}(x)} \\ &= x\cdot(f')^{-1}(x)-f\big((f')^{-1}(x)\big). \end{align*}

It is also clear from looking at a graph that a function inverse is symmetric (i.e., f'=(g')^{-1} \Leftrightarrow g'=(f')^{-1} ), so our boxed definition makes it manifest that the Legendre transform is an involution on the set of convex (and concave) functions. In Arnold’s Figure 43 this symmetry is, shall we say, less obvious.See also Arnold’s Figure 45, where he tries (in my opinion, hopelessly) to make the involution property intuitive. h  

Hold on, you might object, the above boxed definition only fixes the Legendre transformation up to an additive constant, in contrast to Goldstein, Landau, Arnold, and Eq. (2). How will we determine the absolute values of the Hamiltonian and the Lagrangian? …oh wait, those aren’t physically meaningful. All of the dynamical laws are constructed from derivatives of H and L, and we decline to specify an additive constant for the same reason we do so with conservative potentialsBy the way, you weren’t fooled by the Aharanov-Bohm effect into thinking that the potential is more “real” in quantum mechanics, were you? i   and, more generally, anti-derivatives.

The transform as used in mechanics

Lets apply our new characterization of the Legendre transform within mechanics to compare the different approaches. We switch between the Lagrangian and Hamiltonian formulations by changing just the “kinetic” variable (i.e., swapping velocity and momentum), while keeping the configuration coordinates fixed. So let’s introduce functional notation \partial_i and \mathcal{V}_i with i=1,2 for taking the derivative and inverse with respect to just the first (configuration) or second (kinetic) variable, for fixed value of the other.Here I’m adopting something close to the functional notation of Sussman and Wisdom. j   Then the Legendre transform of the kinetic variable determines the gradient in that direction,

\partial_2 H = \mathcal{V}_2 \partial_2 L,
which is just the analog of g' = (f')^{-1}. We also need to set the derivative in the configuration direction, and for this we introduce the “configuration-variable identity function” I_1(x,y) = x and the composition operator \circ on the space of two-variable functions. Our second requirement isWe actually only need this equation to hold on a single value of the kinetic variable since the previous boxed equation automatically extends it to all values. What’s happening here is this: although the additive offset we choose when transforming the kinetic variable (v \leftrightarrow p) is arbitrary, we need to choose the same offset for different values of the configuration coordinate (x) since the configuration gradient is physically important. k  
\partial_1 H = - (\partial_1 L) \circ (I_1,\mathcal{V}_2 \partial_2 L).
The symmetry between H and L is almost manifest here; this last boxed equation is equivalent to \partial_1 L = - (\partial_1 H) \circ (I_1,\mathcal{V} \partial_2 H) = - (\partial_1 H) \circ (I_1,\partial_2 L) because (I_1,\partial_2 H) and (I_1,\partial_2 L) are inverse functions on \mathbb{R}\times\mathbb{R}.At a more advanced level this is known to map between the tangent and cotangent bundles, so the domain and range are isomorphic but not strictly the same. l  

The two boxed equations above define the relationship between the Lagrangian and Hamiltonian through the Legendre transform of the kinetic coordinates. The first box says the gradients of H and L in the kinetic direction are inverses of each other. The second box says the gradients in the configuration direction are negatives of each other once you account for the change of kinetic variables.

Now, after you understand this, it’s of course perfectly fine to use the mnemonic H + L = vp, along with p=\partial_v L and v=\partial_p H, to remember how to quickly compute the Hamiltonian from the Lagrangian, or vice versa. But those cartoon equations suppress all the important structure that tells you what is actually going on, and the equal footing of H and L merely gestures at the symmetry of the transformation. In particular, the fact that you can uniquely solve for either function in terms of their respective pair of independent variables is guaranteed by the tacit assumption of convexity. More philosophically, only the gradients of these functions, not their actual values, are physically meaningful.

OK, so that’s basically all we need to know about the transform to discuss the terribleness of physics textbook in the forthcoming blog post. In this next and final section, I’ll briefly generalize the transform a bit for added context, but consider it optional reading.

Multiple variables and non-convex functions

The Legendre transforms can be extended to multivariate functions f(\vec{x}) and g(\vec{y}) without any surprises. One clear definition is

(4)   \begin{align*} g(\vec{y}) = \vec{x}\cdot \vec{y} - f(\vec{x}) \big\vert_{\vec{x} = (\vec{\partial}f)^{-1}(\vec{y})} \end{align*}

where the inverse function (\vec{\partial}f)^{-1} exists because \vec{\partial}f is a bijection on \mathbb{R}^n when we assume (as we must) that f is convex. In the approach advocated above, this is restated more elegantly as

    \[\vec{\partial} g= (\vec{\partial} f)^{-1},\]

where, again, the involutivity is manifest. For switching between Lagrangians and Hamiltonians, we use the defining conditions

(5)   \begin{align*} \vec{\partial}_2 H = \vec{\mathcal{V}}_2\vec{\partial}_2 L, \qquad \qquad \vec{\partial}_1 H = - (\vec{\partial}_1 L) \circ (\vec{I}_1,\vec{\mathcal{V}}_2 \vec{\partial}_2 L).  \end{align*}

where the subscripts 1 and 2 now refer respectively to groups of configuration and kinetic variables in the natural way. The interpretation is the same: the kinetic gradients are inverses, and the configuration gradients are negatives.

One can generalize (4) to non-convex functions as

(6)   \begin{align*} g(\vec{x}) = \sup_{\vec{y}} \left[\vec{x}\cdot\vec{y} - f(\vec{y}) \right], \end{align*}

where the supremum is of course just the formal way of talking about a maximum over all choices of \vec{y}. When f(\vec{y}) is convex, the right-hand side is maximized under the condition that \vec{\partial}f(y) = \vec{x}, so we recover (4). But more generally, this formula ensures the transform is well defined for any f: \mathbb{R}^n \to \mathbb{R}, and the output g(\vec{x}) is convex regardless.Indeed, generalizing the Legendre transform to non-convex functions, where it is usually known as the convex conjugate, is a justification for using a definition like Eq. (1) rather than \vec{\partial}f = (\vec{\partial}g)^{-1}. See here for more, as well as a nice description in terms of hyperplanes. m   Basically, we’ve defined a way of breaking the ambiguity that comes when \vec{\partial}f doesn’t have a unique inverse, and it turns out that this judicious choice ensures g(\vec{x}) is the Legendre transform of the convex hull of f(\vec{y})! In other words, applying the Legendre transform twice acts as the identify on convex functions but it produces the convex hull on non-convex ones.I thank Kristan Temme for emphasizing this non-mechanical aspect of the Legendre transform to me. n  


The Legendre transform (green arrow) is an involution on the space of convex functions (blue). It leaves invariant the small set of functions (red) whose first derivatives are their own inverse, like x \mapsto x^2/2 and x \mapsto \ln x. The generalized definition (6) acts on the larger space of non-convex function (cylindrical volume), but still maps it down to the space of convex ones. The Legendre transform of a non-convex function is the Legendre transform of its convex hull (purple arrow), which can be obtained by just applying the transform twice.

[I think Robert Lasenby and Godfrey Miller for discussion.]

Footnotes

(↵ returns to text)

  1. I was pleased to note as this essay went to press that my choice of Landau, Goldstein, and Arnold were confirmed as the “standard” suggestions by the top Google results.
  2. If not, can you remember the definition? I couldn’t, a month ago.
  3. Landau & Lifshitz, Hand & Finch, Rossberg, plus a bunch I hadn’t heard of before.
  4. To keep my blood pressure in check, I’m just going to skip completely over the fact that this Hamiltonian has t dependence. Almost all physics textbook fail to clearly explain to the student why we can nevertheless indulge in the sin of pretending that \dot{q} is an independent variable from q, attributable perhaps to the mystery of faith. For clarity on this, I suggest picking up Gelfand and Fomin’s “Calculus of variations”, which is well regarded and available on Amazon for $9.
  5. A similar precise but opaque definition is given in the text by Jose and Saletan, which was inflicted upon me in graduate school. Figures like the one below can be found in Hand & Finch, and in various notes floating around that promise an “easy introduction” to the Legendre transform.
  6. Scott Aaronson, PHYS771 Lecture 9: Quantum: ‘So, what is quantum mechanics? Even though it was discovered by physicists, it’s not a physical theory in the same sense as electromagnetism or general relativity. In the usual “hierarchy of sciences” — with biology at the top, then chemistry, then physics, then math — quantum mechanics sits at a level between math and physics that I don’t know a good name for. Basically, quantum mechanics is the operating system that other physical theories run on as application software (with the exception of general relativity, which hasn’t yet been successfully ported to this particular OS). There’s even a word for taking a physical theory and porting it to this OS: “to quantize.’

    Note, by “the level between physics and math”, Aaronson is talking about the basic information theoretic underpinning of quantum mechanics, whose classical analog is probability theory (which, classically, is generally assumed without discussion). In this post I am talking about the abstract formalism of Lagrangian/Hamiltonian mechanics, which sits above quantum/classical information theory but still below (i.e., more fundamental than) particular physical theories like electromagnetism or gravity.

  7. Thanks to Godfrey Miller for this link.
  8. See also Arnold’s Figure 45, where he tries (in my opinion, hopelessly) to make the involution property intuitive.
  9. By the way, you weren’t fooled by the Aharanov-Bohm effect into thinking that the potential is more “real” in quantum mechanics, were you?
  10. Here I’m adopting something close to the functional notation of Sussman and Wisdom.
  11. We actually only need this equation to hold on a single value of the kinetic variable since the previous boxed equation automatically extends it to all values. What’s happening here is this: although the additive offset we choose when transforming the kinetic variable (v \leftrightarrow p) is arbitrary, we need to choose the same offset for different values of the configuration coordinate (x) since the configuration gradient is physically important.
  12. At a more advanced level this is known to map between the tangent and cotangent bundles, so the domain and range are isomorphic but not strictly the same.
  13. Indeed, generalizing the Legendre transform to non-convex functions, where it is usually known as the convex conjugate, is a justification for using a definition like Eq. (1) rather than \vec{\partial}f = (\vec{\partial}g)^{-1}. See here for more, as well as a nice description in terms of hyperplanes.
  14. I thank Kristan Temme for emphasizing this non-mechanical aspect of the Legendre transform to me.
Bookmark the permalink.

2 Comments

  1. Thank you for this post!

    Making use of slightly more abstract mathematical concepts to truly understand stuff that you normally you are taught only in a “practical” way is one of the most satisfying feeling I have while studying physics.

  2. Thank you for this! Poor explanations survive because nobody wants to admit they found the greats (L&L, Goldstein…) confusing!

Leave a Reply

Include [latexpage] in your comment to render LaTeX equations with $'s. (More.)

Your email address will not be published. Required fields are marked with a *.