Branches as hidden nodes in a neural net

I had been vaguely aware that there was an important connection between tensor network representations of quantum many-body states (e.g., matrix product states) and artificial neural nets, but it didn’t really click together until I saw Roger Melko’s nice talk on Friday about his recent paper with Torlai et al.:There is a title card about “resurgence” from Francesco Di Renzo’s talk at the beginning of the talk you can ignore. This is just a mistake in KITP’s video system.a  

[Download MP4]   [Other options]

In particular, he sketched the essential equivalence between matrix product states (MPS) and restricted Boltzmann machinesThis is discussed in detail by Chen et al. See also good intuition and a helpful physicist-statistician dictionary from Lin and Tegmark.b   (RBM) before showing how he and collaborators could train an efficient RBM representations of the states of the transverse-field Ising and XXZ models with a small number of local measurements from the true state.

As you’ve heard me belabor ad nauseum, I think identifying and defining branches is the key outstanding task inhibiting progress in resolving the measurement problem. I had already been thinking of branches as a sort of “global” tensor in an MPS, i.e., there would be a single index (bond) that would label the branches and serve to efficiently encode a pure state with long-range entanglement due to the amplification that defines a physical measurement process. (More generally, you can imagine branching events with effects that haven’t propagated outside of some region, such as the light-cone or Lieb-Robinson bound, and you might even make a hand-wavy connection to entanglement renormalization.) But I had little experience with constructing MPSs, and finding efficient representations always seemed like an ad-hoc process yielding non-unique results. Demonstrating uniqueness is crucial since it is equivalent to the set-selection problem in the language of consistent histories, and avoid problems (like Wigner’s friend or modern refinements) that otherwise just have to be ignored by fiat. My earlier result showed that the principle of redundant records in quantum Darwinism goes a long way to establishing uniqiueness and, importantly, observed that “branch pruning” might enable the efficient simulation of non-equilibrium systems. However, getting an (even approximate) definition irreversibility remains a crucial challenge; without a notion of irreversibility, record-defined branches would be too fine-grained and transient.

Even with uniqueness and irreversibility, there is not much reason to think that a records-based definition of branches would enable branches to be found efficiently in a numerical simulation. This is what makes the connection to artificial neural nets so exciting. There is already a huge literature on training these nets to find key underlying structure in probability distributions (which are mathematically highly similar to wavefunctions). We won’t have to reinvent the wheel.

Now, for various practical experimental and theoretical reasons, condensed matter theorists focus mostly on equilibrium states. Even outside this, they mostly care about approach to equilibrium and how quantum systems evolve when they are perturbed slightly away from equilibrium. And even when they care about persistent non-equilibrium, they focus on permanent non-equilibrium due to conserved quantities (either simply global conserved quantities like particle number or, more interestingly, the local conserved quantities that lead to Anderson localization). Branches, on the other hand, are all about persistent but non-trivial and non-permanent out-of-equilibrium evolution; we expect branch structure to eventually break down during thermalization. Therefore, rather than using restricted Boltzmann machines, which find equilibrium probability distributions with Boltzmann weights, we’ll probably need a deep learning buzzword with “temporal” or “recurrent” in the name.

So, here are my predictions as concretely as I can make them right now: Efficient simulation of many-body systems that are far from equilibrium but not permanently out of equilibrium will use some sort of neural net structure where the value of key hidden nodes label the branches. The branches will correspond to orthogonal states of the system that are macroscopically distinct in the sense that they can be distinguished by local measurements at many spatially disjoint regions. Different hidden nodes will correspond to different branching events so that the set of all branches is labeled by the joint value of all such nodes. The branches will monotonically become more fine-grained (when compared at a fixed time a la the Heisenberg picture), and in particular the number of branches will increase exponentially with time because the number of branching events will increase linearly with time (and space). Local, experimentally accessible observables will be estimated with high precision by simply sampling from all branches. On the timescale of thermalization, the branch structure “dissolves” by becoming progressively more inefficient to find numerically, and eventually non-unique.

Edit 2017-11-1: Branch identification will only be asymptotically crucial for simulation when (1) the number of branching events (i.e., hidden branch nodes) is extensive in system size at any given time (so that the total number of branches is exponential in space as well as time, as mentioned above), and (2) the effects of branching events have had time to propagate globally.I think Martin Ganahl for discussion on this point.c   Thus, for an N-site lattice simulated for T time steps, this is the large-N limit taken while holding \alpha \equiv Tc/N constant, where c is the speed of propagation and \alpha > 1 is the number of trips around the lattice a perturbation can travel. If we instead hold T constant and take N to infinity, then eventually the correlation length has to saturate (since perturbations simply don’t have time to propagate further than cT lattice sites away) and resources required no longer scales exponentially with N. But obviously, fixed modest values of T (or fixed modest values of N for sufficiently large T) may already easily require infeasible computational resources.


(↵ returns to text)

  1. There is a title card about “resurgence” from Francesco Di Renzo’s talk at the beginning of the talk you can ignore. This is just a mistake in KITP’s video system.
  2. This is discussed in detail by Chen et al. See also good intuition and a helpful physicist-statistician dictionary from Lin and Tegmark.
  3. I think Martin Ganahl for discussion on this point.
Bookmark the permalink.

Leave a Reply

Required fields are marked with a *. Your email address will not be published.

Contact me if the spam filter gives you trouble.

Basic HTML tags like ❮em❯ work. Type [latexpage] somewhere to render LaTeX in $'s. (Details.)