Branches as hidden nodes in a neural net

I had been vaguely aware that there was an important connection between tensor network representations of quantum many-body states (e.g., matrix product states) and artificial neural nets, but it didn’t really click together until I saw Roger Melko’s nice talk on Friday about his recent paper with Torlai et al.:There is a title card about “resurgence” from Francesco Di Renzo’s talk at the beginning of the talk you can ignore. This is just a mistake in KITP’s video system. a  





[Download MP4]   [Other options]

In particular, he sketched the essential equivalence between matrix product states (MPS) and restricted Boltzmann machinesThis is discussed in detail by Chen et al. See also good intuition and a helpful physicist-statistician dictionary from Lin and Tegmark. b   (RBM) before showing how he and collaborators could train an efficient RBM representations of the states of the transverse-field Ising and XXZ models with a small number of local measurements from the true state.

As you’ve heard me belabor ad nauseum, I think identifying and defining branches is the key outstanding task inhibiting progress in resolving the measurement problem. I had already been thinking of branches as a sort of “global” tensor in an MPS, i.e., there would be a single index (bond) that would label the branches and serve to efficiently encode a pure state with long-range entanglement due to the amplification that defines a physical measurement process. (More generally, you can imagine branching events with effects that haven’t propagated outside of some region, such as the light-cone or Lieb-Robinson bound, and you might even make a hand-wavy connection to entanglement renormalization.) But I had little experience with constructing MPSs, and finding efficient representations always seemed like an ad-hoc process yielding non-unique results.… [continue reading]

Links for October 2017

[continue reading]

Models of decoherence and branching

[This is akin to a living review, which will hopefully improve from time to time. Last edited 2017-10-29.]

This post will collect some models of decoherence and branching. We don’t have a rigorous definition of branches yet but I crudely define models of branching to be models of decoherenceI take decoherence to mean a model with dynamics taking the form U \approx \sum_i \vert S_i\rangle\langle S_i |\otimes U^{\mathcal{E}}_i for some tensor decomposition \mathcal{H} = \mathcal{S} \otimes \mathcal{E}, where \{\vert S_i\rangle\} is an (approximately) stable orthonormal basis independent of initial state, and where \mathrm{Tr}[ U^{\mathcal{E}}_i \rho^{\mathcal{E} \dagger}_0 U^{\mathcal{E}}_j ] \approx 0 for times t \gtrsim t_D and i \neq j, where \rho^{\mathcal{E}}_0 is the initial state of \mathcal{E} and t_D is some characteristic time scale. a   which additionally feature some combination of amplification, irreversibility, redundant records, and/or outcomes with an intuitive macroscopic interpretation. I have the following desiderata for models, which tend to be in tension:

  • computational tractability
  • physically realistic
  • symmetric (e.g., translationally)
  • no ad-hoc system-environment distinction
  • Ehrenfest evolution along classical phase-space trajectories (at least on Lyapunov timescales)

Regarding that last one: we would like to recover “classical behavior” in the sense of classical Hamiltonian flow, which (presumably) means continuous degrees of freedom.In principle you could have discrete degrees of freedom that limit, as \hbar\to 0, to some sort of discrete classical systems, but most people find this unsatisfying. b   Branching only becomes unambiguous in some large-N limit, so it seems satisfying models are necessarily messy and difficult to numerically simulate. At the minimum, a good model needs time asymmetry (in the initial state, not the dynamics), sensitive dependence on initial conditions, and a large bath. Most branching will (presumably) be continuous both in time and in number of branches, like a decaying atom where neither the direction nor time of decay are discrete.… [continue reading]

Comments on Weingarten’s preferred branch

A senior colleague asked me for thoughts on this paper describing a single-preferred-branch flavor of quantum mechanics, and I thought I’d copy them here. Tl;dr: I did not find an important new idea in it, but this paper nicely illustrates the appeal of Finkelstein’s partial-trace decoherence and the ambiguity inherent in connecting a many-worlds wavefunction to our direct observations.


We propose a method for finding an initial state vector which by ordinary Hamiltonian time evolution follows a single branch of many-worlds quantum mechanics. The resulting deterministic system appears to exhibit random behavior as a result of the successive emergence over time of information present in the initial state but not previously observed.

We start by assuming that a precise wavefunction branch structure has been specified. The idea, basically, is to randomly draw a branch at late times according to the Born probability, then to evolve it backwards in time to the beginning of the universe and take that as your initial condition. The main motivating observation is that, if we assume that all branch splittings are defined by a projective decomposition of some subsystem (‘the system’) which is recorded faithfully elsewhere (‘the environment’), then the lone preferred branch — time-evolving by itself — is an eigenstate of each of the projectors defining the splits. In a sense, Weingarten lays claim to ordered consistency [arxiv:gr-qc/9607073] by assuming partial-trace decoherenceNote on terminology: What Finkelstein called “partial-trace decoherence” is really a specialized form of consistency (i.e., a mathematical criterion for sets of consistent histories) that captures some, but not all, of the properties of the physical and dynamical process of decoherence.[continue reading]

Symmetries and solutions

Here is an underemphasized way to frame the relationship between trajectories and symmetries (in the sense of Noether’s theorem)You can find this presentation in “A short review on Noether’s theorems, gauge symmetries and boundary terms” by Máximo Bañados and Ignacio A. Reyes (H/t Godfrey Miller). a  . Consider the space of all possible trajectories q(t) for a system, a real-valued Lagrangian functional L[q(t)] on that space, the “directions” \delta q(t) at each point, and the corresponding functional gradient \delta L[q(t)]/\delta q(t) in each direction. Classical solutions are exactly those trajectories q(t) such that the Lagrangian L[q(t)] is stationary for perturbations in any direction \delta q(t), and continuous symmetries are exactly those directions \delta q(t) such that the Lagrangian L[q(t)] is stationary for any trajectory q(t). That is,

(1)   \begin{align*} q(t) \mathrm{\,is\, a\,}\mathbf{solution}\quad \qquad &\Leftrightarrow \qquad \frac{\delta L[q(t)]}{\delta q(t)} = 0 \,\,\,\, \forall \delta q(t)\\ \delta q(t) \mathrm{\,is\, a\,}\mathbf{symmetry} \qquad &\Leftrightarrow \qquad \frac{\delta L[q(t)]}{\delta q(t)} = 0 \,\,\,\, \forall q(t). \end{align*}

There are many subtleties obscured in this cartoon presentation, like the fact that a symmetry \delta q(t), being a tangent direction on the manifold of trajectories, can vary with the tangent point q(t) it is attached to (as for rotational symmetries). If you’ve never spent a long afternoon with a good book on the calculus of variations, I recommend it.

Footnotes

(↵ returns to text)

  1. You can find this presentation in “A short review on Noether’s theorems, gauge symmetries and boundary terms” by Máximo Bañados and Ignacio A. Reyes (H/t Godfrey Miller).
[continue reading]

Links for August-September 2017

  • Popular-level introduction to the five methods used to identify exoplanets.
  • Another good profile of the SEP.
  • ArXiv gets some money to improve stuff.
  • Flying fish are hard to believe. It’s something of a tragedy that fish capable of long-distance flight never evolved (that we know of?). They are so bird like it’s startling, and this ability has evolved independently multiple times.
  • In addition to Russia and China, the US also at one time had ICBMs deployed by rail.
  • On nuclear decommissioning:

    For nuclear power plants governed by the United States Nuclear Regulatory Commission, SAFSTOR (SAFe STORage) is one of the options for nuclear decommissioning of a shut down plant. During SAFSTOR the de-fuelled plant is monitored for up to sixty years before complete decontamination and dismantling of the site, to a condition where nuclear licensing is no longer required. During the storage interval, some of the radioactive contaminants of the reactor and power plant will decay, which will reduce the quantity of radioactive material to be removed during the final decontamination phase.

    The other options set by the NRC are nuclear decommissioning which is immediate dismantling of the plant and remediation of the site, and nuclear entombment which is the enclosure of contaminated parts of the plant in a permanent layer of concrete.Mixtures of options may be used, for example, immediate removal of steam turbine components and condensors, and SAFSTOR for the more heavily radioactive containment vessel. Since NRC requires decommissioning to be completed within 60 years, ENTOMB is not usually chosen since not all activity will have decayed to an unregulated background level in that time.

  • The fraction of the federal budget devoted to NASA peaked in 1966, three years before the Moon landing.
[continue reading]

How to think about Quantum Mechanics—Part 7: Quantum chaos and linear evolution

[Other parts in this series: 1,2,3,4,5,6,7.]

You’re taking a vacation to Granada to enjoy a Spanish ski resort in the Sierra Nevada mountains. But as your plane is coming in for a landing, you look out the window and realize the airport is on a small tropical island. Confused, you ask the flight attendant what’s wrong. “Oh”, she says, looking at your ticket, “you’re trying to get to Granada, but you’re on the plane to Grenada in the Caribbean Sea.” A wave of distress comes over your face, but she reassures you: “Don’t worry, Granada isn’t that far from here. The Hamming distance is only 1!”.

After you’ve recovered from that side-splitting humor, let’s dissect the frog. What’s the basis of the joke? The flight attendant is conflating two different metrics: the geographic distance and the Hamming distance. The distances are completely distinct, as two named locations can be very nearby in one and very far apart in the other.

Now let’s hear another joke from renowned physicist Chris Jarzynski:

The linear Schrödinger equation, however, does not give rise to the sort of nonlinear, chaotic dynamics responsible for ergodicity and mixing in classical many-body systems. This suggests that new concepts are needed to understand thermalization in isolated quantum systems. – C. Jarzynski, “Diverse phenomena, common themes” [PDF]

Ha! Get it? This joke is so good it’s been told by S. Wimberger“Since quantum mechanics is the more fundamental theory we can ask ourselves if there is chaotic motion in quantum systems as well.[continue reading]

Links for June-July 2017

[continue reading]

Selsam on formal verification of machine learning

Here is the first result out of the project Verifying Deep Mathematical Properties of AI SystemsTechnical abstract available here. Note that David Dill has taken over as PI from Alex Aiken. a   funded through the Future of Life Institute.

Noisy data, non-convex objectives, model misspecification, and numerical instability can all cause undesired behaviors in machine learning systems. As a result, detecting actual implementation errors can be extremely difficult. We demonstrate a methodology in which developers use an interactive proof assistant to both implement their system and to state a formal theorem defining what it means for their system to be correct. The process of proving this theorem interactively in the proof assistant exposes all implementation errors since any error in the program would cause the proof to fail. As a case study, we implement a new system, Certigrad, for optimizing over stochastic computation graphs, and we generate a formal (i.e. machine-checkable) proof that the gradients sampled by the system are unbiased estimates of the true mathematical gradients. We train a variational autoencoder using Certigrad and find the performance comparable to training the same model in TensorFlow.

You can find discussion on HackerNews. The lead author was kind enough to answers some questions about this work.

Q: Is the correctness specification usually a fairly singular statement? Or will it often be of the form “The program satisfied properties A, B, C, D, and E”? (And then maybe you add “F” later.)

Daniel Selsam: There are a few related issues: how singular is a specification, how much of the functionality of the system is certified (coverage), and how close the specification comes to proving that the system actually does what you want (validation).… [continue reading]

Reeh–Schlieder property in a separable Hilbert space

As has been discussed here before, the Reeh–Schlieder theorem is an initially confusing property of the vacuum in quantum field theory. It is difficult to find an illuminating discussion of it in the literature, whether in the context of algebraic QFT (from which it originated) or the more modern QFT grounded in RG and effective theories. I expect this to change once more field theorists get trained in quantum information.

The Reeh–Schlieder theorem states that the vacuum \vert 0 \rangle is cyclic with respect to the algebra \mathcal{A}(\mathcal{O}) of observables localized in some subset \mathcal{O} of Minkowski space. (For a single field \phi(x), the algebra \mathcal{A}(\mathcal{O}) is defined to be generated by all finite smearings \phi_f = \int\! dx\, f(x)\phi(x) for f(x) with support in \mathcal{O}.) Here, “cyclic” means that the subspace \mathcal{H}^{\mathcal{O}} \equiv \mathcal{A}(\mathcal{O})\vert 0 \rangle is dense in \mathcal{H}, i.e., any state \vert \chi \rangle \in \mathcal{H} can be arbitrarily well approximated by a state of the form A \vert 0 \rangle with A \in \mathcal{A}(\mathcal{O}). This is initially surprising because \vert \chi \rangle could be a state with particle excitations localized (essentially) to a region far from \mathcal{O} and that looks (essentially) like the vacuum everywhere else. The resolution derives from the fact the vacuum is highly entangled, such that the every region is entangled with every other region by an exponentially small amount.

One mistake that’s easy to make is to be fooled into thinking that this property can only be found in systems, like a field theory, with an infinite number of degrees of freedom. So let me exhibitMost likely a state with this property already exists in the quantum info literature, but I’ve got a habit of re-inventing the wheel. For my last paper, I spent the better part of a month rediscovering the Shor code… a   a quantum state with the Reeh–Schlieder property that lives in the tensor product of a finite number of separable Hilbert spaces:

    \[\mathcal{H} = \bigotimes_{n=1}^N \mathcal{H}_n, \qquad \mathcal{H}_n = \mathrm{span}\left\{ \vert s \rangle_n \right\}_{s=1}^\infty\]

As emphasized above, a separable Hilbert space is one that has a countable orthonormal basis, and is therefore isomorphic to L^2(\mathbb{R}), the space of square-normalizable functions.… [continue reading]

Abstracts for July 2017

  • Modewise entanglement of Gaussian states
    Alonso Botero and Benni Reznik
    We address the decomposition of a multimode pure Gaussian state with respect to a bipartite division of the modes. For any such division the state can always be expressed as a product state involving entangled two-mode squeezed states and single-mode local states at each side. The character of entanglement of the state can therefore be understood modewise; that is, a given mode on one side is entangled with only one corresponding mode of the other, and therefore the total bipartite entanglement is the sum of the modewise entanglement. This decomposition is generally not applicable to all mixed Gaussian states. However, the result can be extended to a special family of “isotropic” states, characterized by a phase space covariance matrix with a completely degenerate symplectic spectrum.

    It is well known that, despite the misleading imagery conjured by the name, entanglement in a multipartite system cannot be understood in terms of pair-wise entanglement of the parts. Indeed, there are only N(N-1) pairs of N systems, but the number of qualitatively distinct types of entanglement scales exponentially in N. A good way to think about this is to recognize that a quantum state of a multipartite system is, in terms of parameters, much more akin to a classical probability distribution than a classical state. When we ask about the information stored in a probability distributions, there are lots and lots of “types” of information, and correlations can be much more complex than just knowing all the pairwise correlations. (“It’s not just that A knows something about B, it’s that A knows something about B conditional on a state of C, and that information can only be unlocked by knowing information from either D or E, depending on the state of F…”).

[continue reading]

Legendre transform

The way that most physicists teach and talk about partial differential equations is horrible, and has surprisingly big costs for the typical understanding of the foundations of the field even among professionals. The chief victims are students of thermodynamics and analytical mechanics, and I’ve mentioned before that the preface of Sussman and Wisdom’s Structure and Interpretation of Classical Mechanics is a good starting point for thinking about these issues. As a pointed example, in this blog post I’ll look at how badly the Legendre transform is taught in standard textbooks,I was pleased to note as this essay went to press that my choice of Landau, Goldstein, and Arnold were confirmed as the “standard” suggestions by the top Google results. a   and compare it to how it could be taught. In a subsequent post, I’ll used this as a springboard for complaining about the way we record and transmit physics knowledge.

Before we begin: turn away from the screen and see if you can remember what the Legendre transform accomplishes mathematically in classical mechanics.If not, can you remember the definition? I couldn’t, a month ago. b   I don’t just mean that the Legendre transform converts the Lagrangian into the Hamiltonian and vice versa, but rather: what key mathematical/geometric property does the Legendre transform have, compared to the cornucopia of other function transforms, that allows it to connect these two conceptually distinct formulations of mechanics?

(Analogously, the question “What is useful about the Fourier transform for understanding translationally invariant systems?” can be answered by something like “Translationally invariant operations in the spatial domain correspond to multiplication in the Fourier domain” or “The Fourier transform is a change of basis, within the vector space of functions, using translationally invariant basis elements, i.e., the Fourier modes”.)

The status quo

Let’s turn to the canonical text by Goldstein for an example of how the Legendre transform is usually introduced.… [continue reading]

Links for May 2017

  • Methane hydrates will be the new shale gas. There is perhaps an order of magnitude more methane worldwide in hydrates than in shale deposits, but it’s harder to extract. “…it’s thought that only by 2025 at the earliest we might be able to look at realistic commercial options.”
  • Sperm whales have no (external) teeth on their upper jaw, which instead features holes into which the teeth on their narrow lower jaw fit.


  • Surprising and heartening to me: GiveWell finds that distributing antiretroviral therapy drugs to HIV positive patients (presumably in developing countries) is potentially cost-effective compared to their top recommendations.
  • Relatedly: the general flow of genetic information is DNA-RNA-protein. At a crude level, viruses are classified as either RNA viruses or DNA viruses depending on what sort of genetic material they carry. Generally, as parasites dependent on the host cell machinery, this determines where in the protein construction process they inject their payload. However, retroviruses (like HIV) are RNA viruses that bring along their own reverse transcriptase enzyme that, once inside the cell, converts their payload back into DNA and then grafts it into the host’s genome (which is then copied as part of the host cell’s lifecycle). Once this happens, it is very difficult to tell which cells have been infected and very difficult to root out the infection.
  • Claims about what makes Amazon’s vertical integration different:

    I remember reading about the common pitfalls of vertically integrated companies when I was in school. While there are usually some compelling cost savings to be had from vertical integration (either through insourcing services or acquiring suppliers/customers), the increased margins typically evaporate over time as the “supplier” gets complacent with a captive, internal “customer.”

    There are great examples of this in the automotive industry, where automakers have gone through alternating periods of supplier acquisitions and subsequent divestitures as component costs skyrocketed.

[continue reading]

Toward relativistic branches of the wavefunction

I prepared the following extended abstract for the Spacetime and Information Workshop as part of my continuing mission to corrupt physicists while they are still young and impressionable. I reproduce it here for your reading pleasure.


Finding a precise definition of branches in the wavefunction of closed many-body systems is crucial to conceptual clarity in the foundations of quantum mechanics. Toward this goal, we propose amplification, which can be quantified, as the key feature characterizing anthropocentric measurement; this immediately and naturally extends to non-anthropocentric amplification, such as the ubiquitous case of classically chaotic degrees of freedom decohering. Amplification can be formalized as the production of redundant records distributed over spatial disjoint regions, a certain form of multi-partite entanglement in the pure quantum state of a large closed system. If this definition can be made rigorous and shown to be unique, it is then possible to ask many compelling questions about how branches form and evolve.

A recent result shows that branch decompositions are highly constrained just by this requirement that they exhibit redundant local records. The set of all redundantly recorded observables induces a preferred decomposition into simultaneous eigenstates unless their records are highly extended and delicately overlapping, as exemplified by the Shor error-correcting code. A maximum length scale for records is enough to guarantee uniqueness. However, this result is grounded in a preferred tensor decomposition into independent microscopic subsystems associated with spatial locality. This structure breaks down in a relativistic setting on scales smaller than the Compton wavelength of the relevant field. Indeed, a key insight from algebraic quantum field theory is that finite-energy states are never exact eigenstates of local operators, and hence never have exact records that are spatially disjoint, although they can approximate this arbitrarily well on large scales.… [continue reading]

Links for April 2017

  • Why does a processor need billions of transistors if it’s only ever executing a few dozen instructions per clock cycle?
  • Nuclear submarines as refuges from global catastrophes.
  • Elite Law Firms Cash in on Market Knowledge“:

    …corporate transactions such as mergers and acquisitions or financings are characterized by several salient facts that lack a complete theoretical account. First, they are almost universally negotiated through agents. Transactional lawyers do not simply translate the parties’ bargain into legally enforceable language; rather, they are actively involved in proposing and bargaining over the transaction terms. Second, they are negotiated in stages, often with the price terms set first by the parties, followed by negotiations primarily among lawyers over the remaining non-price terms. Third, while the transaction terms tend to be tailored to the individual parties, in negotiations the parties frequently resort to claims that specific terms are (or are not) “market.” Fourth, the legal advisory market for such transactions is highly concentrated, with a half-dozen firms holding a majority of the market share.

    [Our] claim is that, for complex transactions experiencing either sustained innovation in terms or rapidly changing market conditions, (1) the parties will maximize their expected surplus by investing in market information about transaction terms, even under relatively competitive conditions, and (2) such market information can effectively be purchased by hiring law firms that hold a significant market share for a particular type of transaction.

    …The considerable complexity of corporate transaction terms creates an information problem: One or both parties may simply be unaware of the complete set of surplus-increasing terms for the transaction, and of their respective outside options should negotiations break down. This problem is distinct from the classic problem of valuation uncertainty.

[continue reading]