Branches as hidden nodes in a neural net

I had been vaguely aware that there was an important connection between tensor network representations of quantum many-body states (e.g., matrix product states) and artificial neural nets, but it didn’t really click together until I saw Roger Melko’s nice talk on Friday about his recent paper with Torlai et al.:There is a title card about “resurgence” from Francesco Di Renzo’s talk at the beginning of the talk you can ignore. This is just a mistake in KITP’s video system. a  





[Download MP4]   [Other options]

In particular, he sketched the essential equivalence between matrix product states (MPS) and restricted Boltzmann machinesThis is discussed in detail by Chen et al. See also good intuition and a helpful physicist-statistician dictionary from Lin and Tegmark. b   (RBM) before showing how he and collaborators could train an efficient RBM representations of the states of the transverse-field Ising and XXZ models with a small number of local measurements from the true state.

As you’ve heard me belabor ad nauseum, I think identifying and defining branches is the key outstanding task inhibiting progress in resolving the measurement problem. I had already been thinking of branches as a sort of “global” tensor in an MPS, i.e., there would be a single index (bond) that would label the branches and serve to efficiently encode a pure state with long-range entanglement due to the amplification that defines a physical measurement process. (More generally, you can imagine branching events with effects that haven’t propagated outside of some region, such as the light-cone or Lieb-Robinson bound, and you might even make a hand-wavy connection to entanglement renormalization.) But I had little experience with constructing MPSs, and finding efficient representations always seemed like an ad-hoc process yielding non-unique results.… [continue reading]

Models of decoherence and branching

[This is akin to a living review, which will hopefully improve from time to time. Last edited 2017-10-29.]

This post will collect some models of decoherence and branching. We don’t have a rigorous definition of branches yet but I crudely define models of branching to be models of decoherenceI take decoherence to mean a model with dynamics taking the form U \approx \sum_i \vert S_i\rangle\langle S_i |\otimes U^{\mathcal{E}}_i for some tensor decomposition \mathcal{H} = \mathcal{S} \otimes \mathcal{E}, where \{\vert S_i\rangle\} is an (approximately) stable orthonormal basis independent of initial state, and where \mathrm{Tr}[ U^{\mathcal{E}}_i \rho^{\mathcal{E} \dagger}_0 U^{\mathcal{E}}_j ] \approx 0 for times t \gtrsim t_D and i \neq j, where \rho^{\mathcal{E}}_0 is the initial state of \mathcal{E} and t_D is some characteristic time scale. a   which additionally feature some combination of amplification, irreversibility, redundant records, and/or outcomes with an intuitive macroscopic interpretation. I have the following desiderata for models, which tend to be in tension:

  • computational tractability
  • physically realistic
  • symmetric (e.g., translationally)
  • no ad-hoc system-environment distinction
  • Ehrenfest evolution along classical phase-space trajectories (at least on Lyapunov timescales)

Regarding that last one: we would like to recover “classical behavior” in the sense of classical Hamiltonian flow, which (presumably) means continuous degrees of freedom.In principle you could have discrete degrees of freedom that limit, as \hbar\to 0, to some sort of discrete classical systems, but most people find this unsatisfying. b   Branching only becomes unambiguous in some large-N limit, so it seems satisfying models are necessarily messy and difficult to numerically simulate. At the minimum, a good model needs time asymmetry (in the initial state, not the dynamics), sensitive dependence on initial conditions, and a large bath. Most branching will (presumably) be continuous both in time and in number of branches, like a decaying atom where neither the direction nor time of decay are discrete.… [continue reading]

Comments on Weingarten’s preferred branch

A senior colleague asked me for thoughts on this paper describing a single-preferred-branch flavor of quantum mechanics, and I thought I’d copy them here. Tl;dr: I did not find an important new idea in it, but this paper nicely illustrates the appeal of Finkelstein’s partial-trace decoherence and the ambiguity inherent in connecting a many-worlds wavefunction to our direct observations.


We propose a method for finding an initial state vector which by ordinary Hamiltonian time evolution follows a single branch of many-worlds quantum mechanics. The resulting deterministic system appears to exhibit random behavior as a result of the successive emergence over time of information present in the initial state but not previously observed.

We start by assuming that a precise wavefunction branch structure has been specified. The idea, basically, is to randomly draw a branch at late times according to the Born probability, then to evolve it backwards in time to the beginning of the universe and take that as your initial condition. The main motivating observation is that, if we assume that all branch splittings are defined by a projective decomposition of some subsystem (‘the system’) which is recorded faithfully elsewhere (‘the environment’), then the lone preferred branch — time-evolving by itself — is an eigenstate of each of the projectors defining the splits. In a sense, Weingarten lays claim to ordered consistency [arxiv:gr-qc/9607073] by assuming partial-trace decoherenceNote on terminology: What Finkelstein called “partial-trace decoherence” is really a specialized form of consistency (i.e., a mathematical criterion for sets of consistent histories) that captures some, but not all, of the properties of the physical and dynamical process of decoherence.[continue reading]

Symmetries and solutions

Here is an underemphasized way to frame the relationship between trajectories and symmetries (in the sense of Noether’s theorem)You can find this presentation in “A short review on Noether’s theorems, gauge symmetries and boundary terms” by Máximo Bañados and Ignacio A. Reyes (H/t Godfrey Miller). a  . Consider the space of all possible trajectories q(t) for a system, a real-valued Lagrangian functional L[q(t)] on that space, the “directions” \delta q(t) at each point, and the corresponding functional gradient \delta L[q(t)]/\delta q(t) in each direction. Classical solutions are exactly those trajectories q(t) such that the Lagrangian L[q(t)] is stationary for perturbations in any direction \delta q(t), and continuous symmetries are exactly those directions \delta q(t) such that the Lagrangian L[q(t)] is stationary for any trajectory q(t). That is,

(1)   \begin{align*} q(t) \mathrm{\,is\, a\,}\mathbf{solution}\quad \qquad &\Leftrightarrow \qquad \frac{\delta L[q(t)]}{\delta q(t)} = 0 \,\,\,\, \forall \delta q(t)\\ \delta q(t) \mathrm{\,is\, a\,}\mathbf{symmetry} \qquad &\Leftrightarrow \qquad \frac{\delta L[q(t)]}{\delta q(t)} = 0 \,\,\,\, \forall q(t). \end{align*}

There are many subtleties obscured in this cartoon presentation, like the fact that a symmetry \delta q(t), being a tangent direction on the manifold of trajectories, can vary with the tangent point q(t) it is attached to (as for rotational symmetries). If you’ve never spent a long afternoon with a good book on the calculus of variations, I recommend it.

Footnotes

(↵ returns to text)

  1. You can find this presentation in “A short review on Noether’s theorems, gauge symmetries and boundary terms” by Máximo Bañados and Ignacio A. Reyes (H/t Godfrey Miller).
[continue reading]

How to think about Quantum Mechanics—Part 7: Quantum chaos and linear evolution

[Other parts in this series: 1,2,3,4,5,6,7.]

You’re taking a vacation to Granada to enjoy a Spanish ski resort in the Sierra Nevada mountains. But as your plane is coming in for a landing, you look out the window and realize the airport is on a small tropical island. Confused, you ask the flight attendant what’s wrong. “Oh”, she says, looking at your ticket, “you’re trying to get to Granada, but you’re on the plane to Grenada in the Caribbean Sea.” A wave of distress comes over your face, but she reassures you: “Don’t worry, Granada isn’t that far from here. The Hamming distance is only 1!”.

After you’ve recovered from that side-splitting humor, let’s dissect the frog. What’s the basis of the joke? The flight attendant is conflating two different metrics: the geographic distance and the Hamming distance. The distances are completely distinct, as two named locations can be very nearby in one and very far apart in the other.

Now let’s hear another joke from renowned physicist Chris Jarzynski:

The linear Schrödinger equation, however, does not give rise to the sort of nonlinear, chaotic dynamics responsible for ergodicity and mixing in classical many-body systems. This suggests that new concepts are needed to understand thermalization in isolated quantum systems. – C. Jarzynski, “Diverse phenomena, common themes” [PDF]

Ha! Get it? This joke is so good it’s been told by S. Wimberger“Since quantum mechanics is the more fundamental theory we can ask ourselves if there is chaotic motion in quantum systems as well.[continue reading]

Reeh–Schlieder property in a separable Hilbert space

As has been discussed here before, the Reeh–Schlieder theorem is an initially confusing property of the vacuum in quantum field theory. It is difficult to find an illuminating discussion of it in the literature, whether in the context of algebraic QFT (from which it originated) or the more modern QFT grounded in RG and effective theories. I expect this to change once more field theorists get trained in quantum information.

The Reeh–Schlieder theorem states that the vacuum \vert 0 \rangle is cyclic with respect to the algebra \mathcal{A}(\mathcal{O}) of observables localized in some subset \mathcal{O} of Minkowski space. (For a single field \phi(x), the algebra \mathcal{A}(\mathcal{O}) is defined to be generated by all finite smearings \phi_f = \int\! dx\, f(x)\phi(x) for f(x) with support in \mathcal{O}.) Here, “cyclic” means that the subspace \mathcal{H}^{\mathcal{O}} \equiv \mathcal{A}(\mathcal{O})\vert 0 \rangle is dense in \mathcal{H}, i.e., any state \vert \chi \rangle \in \mathcal{H} can be arbitrarily well approximated by a state of the form A \vert 0 \rangle with A \in \mathcal{A}(\mathcal{O}). This is initially surprising because \vert \chi \rangle could be a state with particle excitations localized (essentially) to a region far from \mathcal{O} and that looks (essentially) like the vacuum everywhere else. The resolution derives from the fact the vacuum is highly entangled, such that the every region is entangled with every other region by an exponentially small amount.

One mistake that’s easy to make is to be fooled into thinking that this property can only be found in systems, like a field theory, with an infinite number of degrees of freedom. So let me exhibitMost likely a state with this property already exists in the quantum info literature, but I’ve got a habit of re-inventing the wheel. For my last paper, I spent the better part of a month rediscovering the Shor code… a   a quantum state with the Reeh–Schlieder property that lives in the tensor product of a finite number of separable Hilbert spaces:

    \[\mathcal{H} = \bigotimes_{n=1}^N \mathcal{H}_n, \qquad \mathcal{H}_n = \mathrm{span}\left\{ \vert s \rangle_n \right\}_{s=1}^\infty\]

As emphasized above, a separable Hilbert space is one that has a countable orthonormal basis, and is therefore isomorphic to L^2(\mathbb{R}), the space of square-normalizable functions.… [continue reading]

Legendre transform

The way that most physicists teach and talk about partial differential equations is horrible, and has surprisingly big costs for the typical understanding of the foundations of the field even among professionals. The chief victims are students of thermodynamics and analytical mechanics, and I’ve mentioned before that the preface of Sussman and Wisdom’s Structure and Interpretation of Classical Mechanics is a good starting point for thinking about these issues. As a pointed example, in this blog post I’ll look at how badly the Legendre transform is taught in standard textbooks,I was pleased to note as this essay went to press that my choice of Landau, Goldstein, and Arnold were confirmed as the “standard” suggestions by the top Google results. a   and compare it to how it could be taught. In a subsequent post, I’ll used this as a springboard for complaining about the way we record and transmit physics knowledge.

Before we begin: turn away from the screen and see if you can remember what the Legendre transform accomplishes mathematically in classical mechanics.If not, can you remember the definition? I couldn’t, a month ago. b   I don’t just mean that the Legendre transform converts the Lagrangian into the Hamiltonian and vice versa, but rather: what key mathematical/geometric property does the Legendre transform have, compared to the cornucopia of other function transforms, that allows it to connect these two conceptually distinct formulations of mechanics?

(Analogously, the question “What is useful about the Fourier transform for understanding translationally invariant systems?” can be answered by something like “Translationally invariant operations in the spatial domain correspond to multiplication in the Fourier domain” or “The Fourier transform is a change of basis, within the vector space of functions, using translationally invariant basis elements, i.e., the Fourier modes”.)

The status quo

Let’s turn to the canonical text by Goldstein for an example of how the Legendre transform is usually introduced.… [continue reading]

Toward relativistic branches of the wavefunction

I prepared the following extended abstract for the Spacetime and Information Workshop as part of my continuing mission to corrupt physicists while they are still young and impressionable. I reproduce it here for your reading pleasure.


Finding a precise definition of branches in the wavefunction of closed many-body systems is crucial to conceptual clarity in the foundations of quantum mechanics. Toward this goal, we propose amplification, which can be quantified, as the key feature characterizing anthropocentric measurement; this immediately and naturally extends to non-anthropocentric amplification, such as the ubiquitous case of classically chaotic degrees of freedom decohering. Amplification can be formalized as the production of redundant records distributed over spatial disjoint regions, a certain form of multi-partite entanglement in the pure quantum state of a large closed system. If this definition can be made rigorous and shown to be unique, it is then possible to ask many compelling questions about how branches form and evolve.

A recent result shows that branch decompositions are highly constrained just by this requirement that they exhibit redundant local records. The set of all redundantly recorded observables induces a preferred decomposition into simultaneous eigenstates unless their records are highly extended and delicately overlapping, as exemplified by the Shor error-correcting code. A maximum length scale for records is enough to guarantee uniqueness. However, this result is grounded in a preferred tensor decomposition into independent microscopic subsystems associated with spatial locality. This structure breaks down in a relativistic setting on scales smaller than the Compton wavelength of the relevant field. Indeed, a key insight from algebraic quantum field theory is that finite-energy states are never exact eigenstates of local operators, and hence never have exact records that are spatially disjoint, although they can approximate this arbitrarily well on large scales.… [continue reading]

Branches and matrix-product states

I’m happy to use this bully pulpit to advertise that the following paper has been deemed “probably not terrible”, i.e., published.

When the wave function of a large quantum system unitarily evolves away from a low-entropy initial state, there is strong circumstantial evidence it develops “branches”: a decomposition into orthogonal components that is indistinguishable from the corresponding incoherent mixture with feasible observations. Is this decomposition unique? Must the number of branches increase with time? These questions are hard to answer because there is no formal definition of branches, and most intuition is based on toy models with arbitrarily preferred degrees of freedom. Here, assuming only the tensor structure associated with spatial locality, I show that branch decompositions are highly constrained just by the requirement that they exhibit redundant local records. The set of all redundantly recorded observables induces a preferred decomposition into simultaneous eigenstates unless their records are highly extended and delicately overlapping, as exemplified by the Shor error-correcting code. A maximum length scale for records is enough to guarantee uniqueness. Speculatively, objective branch decompositions may speed up numerical simulations of nonstationary many-body states, illuminate the thermalization of closed systems, and demote measurement from fundamental primitive in the quantum formalism.

Here’s the figureThe editor tried to convince me that this figure appeared on the cover for purely aesthetic reasons and this does not mean my letter is the best thing in the issue…but I know better! a   and caption:


Spatially disjoint regions with the same coloring (e.g., the solid blue regions \mathcal{F}, \mathcal{F}', \ldots) denote different records for the same observable (e.g., \Omega_a = \{\Omega_a^{\mathcal{F}},\Omega_a^{\mathcal{F}'},\ldots\}).
[continue reading]

Comments on Cotler, Penington, & Ranard

One way to think about the relevance of decoherence theory to measurement in quantum mechanics is that it reduces the preferred basis problem to the preferred subsystem problem; merely specifying the system of interest (by delineating it from its environment or measuring apparatus) is enough, in important special cases, to derive the measurement basis. But this immediately prompts the question: what are the preferred systems? I spent some time in grad school with my advisor trying to see if I could identify a preferred system just by looking at a large many-body Hamiltonian, but never got anything worth writing up.

I’m pleased to report that Cotler, Penington, and Ranard have tackled a closely related problem, and made a lot more progress:

Locality from the Spectrum
Jordan S. Cotler, Geoffrey R. Penington, Daniel H. Ranard
Essential to the description of a quantum system are its local degrees of freedom, which enable the interpretation of subsystems and dynamics in the Hilbert space. While a choice of local tensor factorization of the Hilbert space is often implicit in the writing of a Hamiltonian or Lagrangian, the identification of local tensor factors is not intrinsic to the Hilbert space itself. Instead, the only basis-invariant data of a Hamiltonian is its spectrum, which does not manifestly determine the local structure. This ambiguity is highlighted by the existence of dualities, in which the same energy spectrum may describe two systems with very different local degrees of freedom. We argue that in fact, the energy spectrum alone almost always encodes a unique description of local degrees of freedom when such a description exists, allowing one to explicitly identify local subsystems and how they interact.
[continue reading]

Singular value decomposition in bra-ket notation

In linear algebra, and therefore quantum information, the singular value decomposition (SVD) is elementary, ubiquitous, and beautiful. However, I only recently realized that its expression in bra-ket notation is very elegant. The SVD is equivalent to the statement that any operator \hat{M} can be expressed as

(1)   \begin{align*} \hat{M} = \sum_i \vert A_i \rangle \lambda_i \langle B_i \vert \end{align*}

where \vert A_i \rangle and \vert B_i \rangle are orthonormal sets of vectors, possibly in Hilbert spaces with different dimensionality, and the \lambda_i \ge 0 are the singular values.

That’s it.… [continue reading]

Comments on Bousso’s communication bound

Bousso has a recent paper bounding the maximum information that can be sent by a signal from first principles in QFT:

I derive a universal upper bound on the capacity of any communication channel between two distant systems. The Holevo quantity, and hence the mutual information, is at most of order E\Delta t/\hbar, where E the average energy of the signal, and \Delta t is the amount of time for which detectors operate. The bound does not depend on the size or mass of the emitting and receiving systems, nor on the nature of the signal. No restrictions on preparing and processing the signal are imposed. As an example, I consider the encoding of information in the transverse or angular position of a signal emitted and received by systems of arbitrarily large cross-section. In the limit of a large message space, quantum effects become important even if individual signals are classical, and the bound is upheld.

Here’s his first figure:



This all stems from vacuum entanglement, an oft-neglected aspect of QFT that Bousso doesn’t emphasize in the paper as the key ingredient.I thank Scott Aaronson for first pointing this out. a   The gradient term in the Hamiltonian for QFTs means that the value of the field at two nearby locations is always entangled. In particular, the value of \phi(x) and \phi(x+\Delta x) are sometimes considered independent degrees of freedom but, for a state with bounded energy, they can’t actually take arbitrarily different values as \Delta x becomes small, or else the gradient contribution to the Hamiltonian violates the energy bound. Technically this entanglement exists over arbitrary distances, but it is exponentially suppressed on scales larger than the Compton wavelength of the field.… [continue reading]

How to think about Quantum Mechanics—Part 1: Measurements are about bases

[This post was originally “Part 0”, but it’s been moved. Other parts in this series: 1,2,3,4,5,6,7.]

In an ideal world, the formalism that you use to describe a physical system is in a one-to-one correspondence with the physically distinct configurations of the system. But sometimes it can be useful to introduce additional descriptions, in which case it is very important to understand the unphysical over-counting (e.g., gauge freedom). A scalar potential V(x) is a very convenient way of representing the vector force field, F(x) = \partial V(x), but any constant shift in the potential, V(x) \to V(x) + V_0, yields forces and dynamics that are indistinguishable, and hence the value of the potential on an absolute scale is unphysical.

One often hears that a quantum experiment measures an observable, but this is wrong, or very misleading, because it vastly over-counts the physically distinct sorts of measurements that are possible. It is much more precise to say that a given apparatus, with a given setting, simultaneously measures all observables with the same eigenvectors. More compactly, an apparatus measures an orthogonal basis – not an observable.We can also allow for the measured observable to be degenerate, in which case the apparatus simultaneously measures all observables with the same degenerate eigenspaces. To be abstract, you could say it measures a commuting subalgebra, with the nondegenerate case corresponding to the subalgebra having maximum dimensionality (i.e., the same number of dimensions as the Hilbert space). Commuting subalgebras with maximum dimension are in one-to-one correspondence with orthonormal bases, modulo multiplying the vectors by pure phases. a   You can probably start to see this by just noting that there’s no actual, physical difference between measuring X and X^3; the apparatus that would perform the two measurements are identical.… [continue reading]

Bleg: Classical theory of measurement and amplification

I’m in search of an authoritative reference giving a foundational/information-theoretic approach to classical measurement. What abstract physical properties are necessary and sufficient?

Motivation: The Copenhagen interpretation treats the measurement process as a fundamental primitive, and this persists in most uses of quantum mechanics outside of foundations. Of course, the modern view is that the measurement process is just another physical evolution, where the state of a macroscopic apparatus is conditioned on the state of a microscopic quantum system in some basis determined by their mutual interaction Hamiltonian. The apparent nonunitary aspects of the evolution inferred by the observer arises because the measured system is coupled to the observer himself; the global evolution of the system-apparatus-observer system is formally modeled as unitary (although the philosophical meaningfulness/ontology/reality of the components of the wavefunction corresponding to different measurement outcomes is disputed).

Eventually, we’d like to be able to identify all laboratory measurements as just an anthropocentric subset of wavefunction branching events. I am very interested in finding a mathematically precise criteria for branching.Note that the branches themselves may be only precisely defined in some large-N or thermodynamic limit. a   Ideally, I would like to find a property that everyone agrees must apply, at the least, to laboratory measurement processes, and (with as little change as possible) use this to find all branches — not just ones that result from laboratory measurements.Right now I find the structure of spatially-redundant information in the many-body wavefunction to be a very promising approach. b  

It seems sensible to begin with what is necessary for a classical measurement since these ought to be analyzable without all the philosophical baggage that plagues discussion of quantum measurement.… [continue reading]

Comments on an essay by Wigner

[PSA: Happy 4th of July. Juno arrives at Jupiter tonight!]

This is short and worth reading:

The sharp distinction between Initial Conditions and Laws of Nature was initiated by Isaac Newton and I consider this to be one of his most important, if not the most important, accomplishment. Before Newton there was no sharp separation between the two concepts. Kepler, to whom we owe the three precise laws of planetary motion, tried to explain also the size of the planetary orbits, and their periods. After Newton's time the sharp separation of initial conditions and laws of nature was taken for granted and rarely even mentioned. Of course, the first ones are quite arbitrary and their properties are hardly parts of physics while the recognition of the latter ones are the prime purpose of our science. Whether the sharp separation of the two will stay with us permanently is, of course, as uncertain as is all future development but this question will be further discussed later. Perhaps it should be mentioned here that the permanency of the validity of our deterministic laws of nature became questionable as a result of the realization, due initially to D. Zeh, that the states of macroscopic bodies are always under the influence of their environment; in our world they can not be kept separated from it.

This essay has no formal abstract; the above is the second paragraph, which I find to be profound. Here is the PDF. The essay shares the same name and much of the material with Wigner’s 1963 Nobel lecture [PDF].The Nobel lecture has a nice bit contrasting invariance principles with covariance principles, and dynamical invariance principles with geometrical invariance principles.[continue reading]