How to think about Quantum Mechanics—Part 1: Measurements are about bases

[This post was originally “Part 0”, but it’s been moved. Other parts in this series: 1,2,3,4,5,6,7,8.]

In an ideal world, the formalism that you use to describe a physical system is in a one-to-one correspondence with the physically distinct configurations of the system. But sometimes it can be useful to introduce additional descriptions, in which case it is very important to understand the unphysical over-counting (e.g., gauge freedom). A scalar potential V(x) is a very convenient way of representing the vector force field, F(x) = \partial V(x), but any constant shift in the potential, V(x) \to V(x) + V_0, yields forces and dynamics that are indistinguishable, and hence the value of the potential on an absolute scale is unphysical.

One often hears that a quantum experiment measures an observable, but this is wrong, or very misleading, because it vastly over-counts the physically distinct sorts of measurements that are possible. It is much more precise to say that a given apparatus, with a given setting, simultaneously measures all observables with the same eigenvectors. More compactly, an apparatus measures an orthogonal basis – not an observable.We can also allow for the measured observable to be degenerate, in which case the apparatus simultaneously measures all observables with the same degenerate eigenspaces. To be abstract, you could say it measures a commuting subalgebra, with the nondegenerate case corresponding to the subalgebra having maximum dimensionality (i.e., the same number of dimensions as the Hilbert space). Commuting subalgebras with maximum dimension are in one-to-one correspondence with orthonormal bases, modulo multiplying the vectors by pure phases.a   You can probably start to see this by just noting that there’s no actual, physical difference between measuring X and X^3; the apparatus that would perform the two measurements are identical.

In the rest of this post, I’ll lay things out very explicitly. I’m going to show how simply acknowledging that a measurement is carried out by a physical apparatus is enough to infer

  1. that the set of possible eigenstate outcomes (the basis) is all that physically matters,
  2. that the basis must be orthogonal, and consequently that,
  3. it’s just as sensible to talk about the measurement of non-Hermitian normal operators as traditional observables (Hermitian operators).

I’ll mostly be following ZurekWojciech H. Zurek, Phys. Rev. A 76, 052110 (2007), [arXiv:quant-ph/0703160]; Phys. Rev. A 87, 052111 (2013) [arXiv:1212.3245].b  , who first pointed out (2). For simplicity we’ll assume a finite-dimensional Hilbert space. None of this requires you to adopt a many-worlds interpretation or anything; feel free to just stick with Copenhagen and pull the Heisenberg cut up a bit higher so the apparatus is contained within the quantum description.

Toy measurement model

Consider what a physical measuring apparatus \mathcal{A} actually does when it measures a system \mathcal{S}. From some “ready” state \vert A_0 \rangle, initially unentangled with the system, the apparatus interacts unitarily such that different possible states \vert S_i \rangle of the system are recorded in distinct conditional out-states \vert A_i \rangle of the apparatus. These out-states will correspond, at the least, to different macroscopic configurations of the apparatus’s readout system (the “pointer”), e.g., the macroscopic arrangements of atoms in a screen interpreted as “up” rather than “down”.

Let us first assume the apparatus can make a non-disturbing measurement. Then for each i, the unitary U_{\mathrm{M}} describing the measurement process must act in this manner:

(1)   \begin{align*} \vert S_i \rangle \vert A_0 \rangle \quad \overset{U_{\mathrm{M}}}\to \quad U_{\mathrm{M}} \left(\vert S_i \rangle \vert A_0 \rangle \right) =   \vert S_i \rangle \vert A_i \rangle \end{align*}

A defining characteristic of unitaries is that they preserve the inner product between vectors, so

(2)   \begin{align*} \langle S_i \vert S_j \rangle \langle A_0 \vert A_0 \rangle =   \langle S_i \vert S_j \rangle \langle A_i \vert A_j \rangle. \end{align*}

Since \langle A_0 \vert A_0 \rangle = 1, the requirement that the measuring device evolves into distinct states, \langle A_i \vert A_j \rangle \neq 1, for different outcomes i immediately implies that \langle S_i \vert S_j \rangle = \delta_{ij}, i.e., that the set of system states being distinguished must be orthogonal.

Now, let’s relax the assumption that the measurement is non-disturbing. Instead, we will appeal to the key characteristic of a measuring apparatus — that it must amplify. More precisely, the apparatus must contain many parts \mathcal{A} = \otimes_n \mathcal{A}^{(n)} in which the outcome is recorded distinctly. For simplicity, let us simply define \mathcal{A}^{(n)} for n = 1,\dots, N to be the minimal degrees of freedom which are put into a distinct state conditional on the outcome of the measurement, and let \mathcal{A}^{(\varnothing)} be the (messy) rest of the apparatus, which will generally become entangled with the system. Then for each i we must have

(3)   \begin{align*} \vert S_i \rangle \vert A_0 \rangle = \vert S_i \rangle \vert A_0^{(\varnothing)} \rangle \vert A_0^{(1)} \rangle \vert A_0^{(2)} \rangle \cdots   \quad \overset{U_{\mathrm{M}}}\to \quad  \vert SA^{(\varnothing)}_i \rangle \vert A_i^{(1)} \rangle \vert A_i^{(2)}\rangle \cdots \end{align*}

where \vert SA^{(\varnothing)}_i \rangle is an arbitrary, possibly entangled joint state of \mathcal{S}\otimes \mathcal{A}^{(\varnothing)}. We again have not assumed a priori that the \vert S_i \rangle or the \vert A_i^{(n)} \rangle are orthogonal, just that they are distinct states. Nonetheless, unitary evolution preserves the inner product between states, so

(4)   \begin{align*} \langle S_i\vert S_j \rangle =  \langle SA^{(\varnothing)}_i \vert SA^{(\varnothing)}_j \rangle \left( \prod_{n=1}^N \langle A_i^{(n)} \vert A_j^{(n)} \rangle\right). \end{align*}

Then, regardless of the value of \langle SA^{(\varnothing)}_i \vert SA^{(\varnothing)}_j \rangle, we must have that \langle S_i\vert S_j \rangle = \delta_{ij} unless \vert \langle A_i^{(n)} \vert A_j^{(n)} \rangle \vert \to 1 for almost all n. Since a functioning amplifier must produce many distinct copies (records) of the amplified information, we conclude that the system states we are distinguishing, \{\vert S_i \rangle\}, are orthogonal.

Note that we have not lost the generality of our argument by assuming that the various components of the apparatus \mathcal{A}^{(i)} end up in pure states, unentangled with the rest of the apparatus and system. Our only requirement is that, for something to be a proper amplifier, one can choose some tensor structure \mathcal{A} = \otimes_n \mathcal{A}^{(n)} in which this is so, and that’s always possible even if the natural, intuitive parts of the system in which the copies of the information are stored (e.g., the atoms of the macroscopic pointer readout) are in mixed states (so long as the mixed states are distinct). See Zurek for details.

Implications

So it’s clear from what a measuring apparatus actually does that there is no physical difference between measuring two observables with the same eigenvectors, for the same reason that, even classically, there’s no physical difference between measuring in centimeters and inches; it’s just the labeling on your ruler. The only thing that is meaningful is the orthogonal basis \{\vert S_i \rangle\} defining the measurement process. All that talking about observables adds to this is naming the eigenstates.Of course there is a physical difference between measuring X and X^2, since the latter would imply an apparatus that moves into the exact same conditional out-state if the system starts in an either eigenstate \vert X = x_0\rangle or \vert X = -x_0\rangle.c  

In fact, it makes as much sense to measure a normal operator as a Hermitian one. Recall that normal and Hermitian operators are defined by the conditions [\mathcal{O},\mathcal{O}^\dagger]=0 and \mathcal{O} = \mathcal{O}^\dagger, respectively. (Obviously, Hermitian operators are a subset of normal operators.) Equivalently, we can say that normal operators are defined by the fact that they have orthogonal eigenvectors, while Hermitian operators must additionally have real eigenvalues. It’s perfectly sensible to say, when we are determining the amplitude and phase of a macroscopic electromagnetic field, that we are measuring a single normal operator whose eigenvalues are complex. And if we wanted to be ornery, we could point out that there’s really nothing objectionable about measuring an operator

(5)   \begin{align*} \mathcal{O} = \eta_\mathrm{red}\vert S_\mathrm{red} \rangle \langle S_\mathrm{red} \vert + \eta_\mathrm{green}\vert S_\mathrm{green} \rangle \langle S_\mathrm{green} \vert \end{align*}

where \eta_\mathrm{red} and \eta_\mathrm{green} are elements of some (possibly finite) field which is neither the reals nor the complex numbers. In all these cases, the only thing that matter is the set of states \{\vert S_i \rangle\}.

(Note that there are still plenty of places in quantum mechanics where the Hermiticity of an operator is critical, such as the Hamiltonian. But then the meaningfulness of the reality of the eigenvalues is connected to the fact that the Hamiltonian is not just something that can be measured, but is used to generate time translation, in which case the eigenvalues are “doing work”.)

Blame

Why are the above simple observations not known by undergraduates, or even by professors? I tentatively blame the axiomatic approaches to quantum mechanics as put forth by the titans like Dirac and von Neumann, or at least their typical presentation to other physicist. In particular, when you take

  • The expectation value of an observable A for a system in a state \psi is given by \langle \psi \vert A \vert \psi \rangle.

as an irreducible axiom of the universe, you obscure a great deal. This seems to be grounded in early formulations of Copenhagen, where the measurement operation was a definitive event, linking the quantum description with observed classical variables at a time and place. (This is to be contrasted with modern Copenhagen approaches where arbitrarily large objects can in principle be given quantum descriptions and the Heisenberg cut is fluid…as long as it is placed somewhere.Heisenberg: “The dividing line between the system to be observed and the measuring apparatus is immediately defined by the nature of the problem but it obviously signifies no discontinuity of the physical process. For this reason there must, within limits, exist complete freedom in choosing the position of the dividing line.” See Schlosshauer and Camilleri (2011).d  )

Of course, it’s clear that von Neumann made deep, deep insights about the completeness of the quantum description and the problems with hidden variablesThis work strongly contributed to Bell’s theorem. There is disagreement as to whether von Neumann’s proof against hidden variables was foolish or whether von Neumann understood the limitations of his conclusions.e  , and that this was achieved by linking what could actually be discovered about a system to complete sets of observables (maximal sets of commuting Hermitian operator). Nonetheless, there is a danger in taking these mathematical objects too seriously, and not taking seriously enough the fundamentally quantum nature of an apparatus.

Footnotes

(↵ returns to text)

  1. We can also allow for the measured observable to be degenerate, in which case the apparatus simultaneously measures all observables with the same degenerate eigenspaces. To be abstract, you could say it measures a commuting subalgebra, with the nondegenerate case corresponding to the subalgebra having maximum dimensionality (i.e., the same number of dimensions as the Hilbert space). Commuting subalgebras with maximum dimension are in one-to-one correspondence with orthonormal bases, modulo multiplying the vectors by pure phases.
  2. Wojciech H. Zurek, Phys. Rev. A 76, 052110 (2007), [arXiv:quant-ph/0703160]; Phys. Rev. A 87, 052111 (2013) [arXiv:1212.3245].
  3. Of course there is a physical difference between measuring X and X^2, since the latter would imply an apparatus that moves into the exact same conditional out-state if the system starts in an either eigenstate \vert X = x_0\rangle or \vert X = -x_0\rangle.
  4. Heisenberg: “The dividing line between the system to be observed and the measuring apparatus is immediately defined by the nature of the problem but it obviously signifies no discontinuity of the physical process. For this reason there must, within limits, exist complete freedom in choosing the position of the dividing line.” See Schlosshauer and Camilleri (2011).
  5. This work strongly contributed to Bell’s theorem. There is disagreement as to whether von Neumann’s proof against hidden variables was foolish or whether von Neumann understood the limitations of his conclusions.
Bookmark the permalink.

15 Comments

  1. I wonder whether you’re OK with one particular consequence of allowing normal operators, that the sum of two normal operators, \hat N_1+\hat N_2, may not be a normal operator, [\hat N_1+\hat N_2,\hat N_1^\dagger+\hat N_2^\dagger]=[\hat N_1,\hat N_2^\dagger]+[\hat N_2,\hat N_1^\dagger], whereas the sum of Hermitian operators is Hermitian (\hat H_1+\hat H_2)^\dagger=\hat H_1+\hat H_2?

    Secondly, if we measure the spectra of Hermitian operators \hat H_1, \hat H_2, and \lambda\hat H_1+(1-\lambda)\hat H_2, comparison of the latter (taking various real values of \lambda) with the first two gives us some information about the relative orientation of the eigenvectors of \hat H_1 and of \hat H_2 (and more so if we consider more Hermitian operators, though the computation looks daunting), so the eigenvalues are significant at least to that extent?

    Though I agree that a focus on the spectra and the relative orientations of eigenbases of measurements would not be amiss, these two aspects taken together might perhaps be enough to make it distracting to introduce your “simple observations” to undergraduates?

    Operationally, however, I’m curious whether I know how to measure \lambda\hat H_1+(1-\lambda)\hat H_2 just because I know how to measure both \hat H_1 and \hat H_2?

    • Thanks Peter, these are great questions for refining my position.

      Yea, I’m very comfortable with the fact that the set of things we measure with an apparatus is not closed under addition for the same reason that I’m fine that the sum of a position and a momentum is undefined. Operationally, two things that I can measure don’t have to have a well defined sum. And mathematically, it shouldn’t bother us that the sum of two different bases of the same vector space isn’t defined.

      The thing I’m trying to do is (1) identify the set of mathematical structures that correspond to what we can physically measure and (2) point out that the Hermitian operators are not — and are not in 1-to-1 correspondence with — that set. In particular, I didn’t claim that normal operators are measurable, I claimed that they are just as measurable as Hermitian operators. But maybe it’s better to simply emphasize that measurements should be identified with PVMs/POVMs.

      <rant>Regarding undergraduates: I remember as an undergraduate being utterly baffled by the bizarre declaration that Hermitian operators corresponded to observables. (In some sense, it was the ultimate distraction since a decade later I am still following the path it sent me down!) Such an axiom had no counterpart in classical mechanics, and it was never justified. In my opinion, the people who managed to avoid this distraction were just learning to accept things without understanding them. This ability to suspend disbelief is great if you want to create good graduate student slaves for doing computations, but not so good for training the next generation of physicists. </rant>

      The question of whether you can operationalize the notion of measuring the sum of two normal operators is an interesting one. I don’t know the answer. I haven’t even seen someone try to operationalize the sum of two Hermitian operators. If you have a device that can measure spin Z and spin X, there doesn’t seem to me to be any reason for the sum of those two to be measurable with that device. I interpret this as more evidence that “the set of things we can measure” should not be identified with the Hermitian operators.

      Indeed, it was only after I had been in graduate school for a while that I discovered a bit of the historical background behind the mathematical importance of Hermitian operators, especially as elucidated by von Neumann. The mistake wasn’t the emphasis on understanding these objects, but rather the terrible attempt to identify them with the physical process of measurement.

      • FWIW, your answers seem OK to me, though I suppose different materials are needed for different undergraduates.
        My questions came out of a current interest in digital (that is, binary, on a computer) records of experimental results. An Analogue-Digital Conversion is effectively constructed as a set of projections \chi_{[a,b]}(\hat A) [is the recorded value associated with the real-valued observable in the range [a,b], or more generally in a set S (a set in the complex plane if \hat A is normal)? Answer=0 or 1; lots of such questions gives us, say, a 16-bit recorded result in computer memory; any such process is determinedly neither linear nor continuous, yet they’re routine, millions of times over, at CERN, say].
        If we have an apparatus that we say measures \chi_{[a,b]}(\hat A) and \chi_{[a,b]}(\hat B) as two bits, we might think of that as measuring \chi_{[a,b]}(\hat A)\times 2+\chi_{[a,b]}(\hat B), which is OK if [\hat A,\hat B]=0, however if [\hat A,\hat B]\not=0 the eigenvalues of \chi_{[a,b]}(\hat A)\times 2+\chi_{[a,b]}(\hat B) in general will not be \{0,1,2,3\}. We could say that i\in\{0,1,2,3\} represents the i‘th eigenvalue of \chi_{[a,b]}(\hat A)\times 2+\chi_{[a,b]}(\hat B), or we could say that we have applied a second nonlinear ADC process to obtain our two bit record (or we might say something altogether different?). [Again at CERN, computer records are as often clearly of events that are at time-like separation and hence a priori do not commute.]
        It has always seemed to me rather more interesting to know what we have measured than what the measurement results are, which is as if to say in theory-speak that the eigenvectors of an operator give us more information than the eigenvalues, which is, I take it, a crude paraphrase of part of your blog post.
        Perhaps I need <rant>…</rant> round all that? Answer any part of it that you find it useful to contemplate, but leave it alone otherwise.

        • I’m pretty sure that the eigenvalues of \chi_{[a,b]}(\hat A)\times 2+\chi_{[a,b]}(\hat B) are not in general \{1,2,3,4\} for [\hat A,\hat B] \neq 0, so measuring that composite variable just doesn’t have a simple interpretation as measuring two binary variables. One way to try to make such a measurement would be to measure \chi_{[a,b]}(\hat A) followed by \chi_{[a,b]}(\hat B), but then the order becomes important.

          Even when two measurements are not space-like separated, they very often commute (to very high accuracy) simply because the coupling between them is very weak. Different detectors within ATLAS often correspond to commuting measurements for the same reason that time-like separated measurements made on different continents do. And even when they don’t, there’s an objective ordering guaranteed by their time-like separation.

      • Apparently my last paragraph “rant”…”/rant” was converted to just … . Hey ho.

        • Yea, I had to use a raw unicode character for the less-than signs to keep it from trying to parse it as html. This is typed in as

          & # 60 ;

          only without any spaces. Took me a couple of tries on my first attempt.

          Hope you don’t mind that I edited your comment to fix this for you :)

  2. This morning, we have this in a Springer notification e-mail, https://link.springer.com/article/10.1007/s40509-016-0098-2, “Are observables necessarily Hermitian?” I haven’t looked at the paper yet past the abstract, which says, almost exactly on this post’s topic, that “observables should be reformulated as normal operators including Hermitian operators as a subclass” (the arXiv version is https://arxiv.org/abs/1601.04287, which predates your post).

    • Thanks Peter! Although their paper (Jan 2016) predates this post (Nov 2016), this post is actually just a fleshed out version of the same point which I have had on my website since 2013:

      > Wrong: Observables are “represented” (?) by Hermitian operators.
      > Right: Measurements necessarily amplify, and therefore (!) are associated
      > with an orthogonal basis. This is the Schmidt basis of the entangled
      > joint state of the measuring apparatus and the measured system.
      > More: Wojciech H. Zurek, Phys. Rev. A 76, 052110 (2007),
      > [arXiv:quant-ph/0703160]. Also: [arXiv:1212.3245].
      > Implication: Observables can be associated with normal, not just
      > Hermitian, operators.

      Furthermore, the idea that normal operators are observables follows almost immediately from Wojciech’s 2007 paper (which Hu et al. cite). That paper was one of the reasons I sought him out as my advisor, physically moving from California to New Mexico, and I probably just picked the idea up from him during discussion while I was there.

      Of course, the idea itself is much more important than priority. I’d be very glad if their publication makes this idea common knowledge taught in introductory quantum mechanics courses. But I’m not holding my breath…

  3. Pingback: Blogs I Follow – EGO PON's Blog

  4. In your “Implications” section, when you say that a normal operator is “non-singular”, do you mean “non-defective”? A normal operator can certainly be singular.

    • Thanks Teddy. Honestly I can’t remember what I was trying to say with that condition back when I wrote this. You’re right of course that normal (and Hermitian) operators can perfectly well be singular (i.e., not have an inverse); they just need zero eigenvalues. I’ve now removed that condition from the post.

      • One other thought about your footnote “a”, regarding degenerate observables. Instead of saying that a generic (possibly degenerate) measurement apparatus represents a commuting subalgebra, can’t you instead say that it represents a particular decomposition of the Hilbert space into an (internal) direct sum of orthogonal subspaces? Each measurement outcome corresponds to a different subspace, and the nondegenerate case corresponds to a maximal decomposition into a direct sum of only one-dimensional subspaces.

        This seems (to me) conceptually simpler than thinking in terms of commuting algebras, although that’s perhaps a matter of opinion. And more importantly, it seems closer to the spirit of the rest of your discussion, where your point is that a particular measurement apparatus is better thought of as being associated with an intrinsic property of the Hilbert space itself (e.g. a particular direct-sum decomposition or choice of basis for the space) than with specific operators on that Hilbert space.

  5. It seems to me that your equation (3) should allow for the entanglement of the states |A_i^{(j)}\rangle after the measurement, in which case (4) does not follow immediately.

    Even if we assume (4), it only implies that the system \{|S_i\rangle\} is approximately orthonormal, with bigger, more amplifying measurement apparatuses leading to smaller values of |\langleS\rangle| for i\ne j.

    • I do address your first sentence when I say “Note that we have not lost the generality of our argument by assuming that the various components of the apparatus \mathcal{A}^{(i)} end up in pure states…”. However, even if we take the tensor structure to be fixed and we allow for the different subsystems to become entangled, one can check that we get the same conclusion in the end so long as the different subsystems have distinct reduced states conditional on the index i, i.e., that \rho^{(n)}_i \neq \rho^{(n)}_j for i\neq j. That is, all that is necessary is that some useful information about the outcome appears in the many subsystems indexed by n.

      I agree with your second sentence, but it’s important to emphasize that the overlaps between the states \{|S_i\rangle\} must be exponentially small in n. For any reasonable amplifier, this is extremely tiny.

Leave a Reply

Required fields are marked with a *. Your email address will not be published.

Contact me if the spam filter gives you trouble.

Basic HTML tags like ❮em❯ work. Type [latexpage] somewhere to render LaTeX in $'s. (Details.)