Comments on Ollivier’s “Emergence of Objectivity for Quantum Many-Body Systems”

Harold Ollivier has put out a nice paper generalizing my best result:

We examine the emergence of objectivity for quantum many-body systems in a setting without an environment to decohere the system’s state, but where observers can only access small fragments of the whole system. We extend the result of Reidel (2017) to the case where the system is in a mixed state, measurements are performed through POVMs, and imprints of the outcomes are imperfect. We introduce a new condition on states and measurements to recover full classicality for any number of observers. We further show that evolutions of quantum many-body systems can be expected to yield states that satisfy this condition whenever the corresponding measurement outcomes are redundant.

Ollivier does a good job of summarizing why there is an urgent need to find a way to identify objectively classical variables in a many-body system without leaning on a preferred system-environment tensor decomposition. He also concisely describes the main results of my paper in somewhat different language, so some of you may find his version nicer to read.A minor quibble: Although this is of course a matter of taste, I disagree that the Shor code example was the “core of the main result” of my paper. In my opinion, the key idea was that there was a sensible way of defining redundancy at all in a way that allowed for proving statements about compatibility without recourse to a preferred non-microscopic tensor structure. The Shor-code example is more important for showing the limits of what redundancy can tell you (which is saturated in a weak sense).[continue reading]

Compact precise definition of a transformer function

Although I’ve been repeatedly advised it’s not a good social strategy, a glorious way to start a research paper is with specific, righteous criticism of your anonymous colleagues:For read-ability, I have dropped the citations and section references from these quotes without marking the ellipses.a  

Transformers are deep feed-forward artificial neural networks with a (self)attention mechanism. They have been tremendously successful in natural language processing tasks and other domains. Since their inception 5 years ago, many variants have been suggested. Descriptions are usually graphical, verbal, partial, or incremental. Despite their popularity, it seems no pseudocode has ever been published for any variant. Contrast this to other fields of computer science, even to “cousin” discipline reinforcement learning.

So begin Phuong & Hutter in a great, rant-filled paper that “covers what Transformers are, how they are trained, what they’re used for, their key architectural components, tokenization, and a preview of practical considerations, and the most prominent models.” As an exercise, in this post I’m dig into the first item by writing down an even more compact definition of a transformer than theirs, in the form of a mathematical function rather than pseudocode, while avoiding the ambiguities rampant in the rest of the literature. I will consider only what a single forward-pass of a transformer does, considered as a map from token sequences to probability distributions over the token vocabulary. I do not try to explain the transformer, nor do I address other important aspects like motivation, training, and computational.

(This post also draws on a nice introduction by Turner. If you are interested in understanding and interpretation, you might check out — in descending order of sophistication — Elhage et al.[continue reading]

Unital dynamics are mixedness increasing

After years of not having an intuitive interpretation of the unital condition on CP maps, I recently learned a beautiful one: unitality means the dynamics never decreases the state’s mixedness, in the sense of the majorization partial order.

Consider the Lindblad dynamics generated by a set of Lindblad operators L_k, corresponding to the Lindbladian

(1)   \begin{align*} \mathcal{L}[\rho] = \sum_k\left(L_k\rho L_k^\dagger - \{L_k^\dagger L_k,\rho\}/2\right) \end{align*}

and the resulting quantum dynamical semigroup \Phi_t[\rho] = e^{t\mathcal{L}}[\rho]. Let

(2)   \begin{align*} S_\alpha[\rho] = \frac{\ln\left(\mathrm{Tr}[\rho^\alpha]\right)}{1-\alpha}, \qquad \alpha\ge 0 \end{align*}

be the Renyi entropies, with S_{\mathrm{vN}}[\rho]:=\lim_{\alpha\to 1} S_\alpha[\rho] = -\mathrm{Tr}[\rho\ln\rho] the von Neumann entropy. Finally, let \prec denote the majorization partial order on density matrices: \rho\prec\rho' exactly when \mathrm{spec}[\rho]\prec\mathrm{spec}[\rho'] exactly when \sum_{i=1}^r \lambda_i \le \sum_{i=1}^r \lambda_i^\prime for all r, where \lambda_i and \lambda_i^\prime are the respective eigenvalues in decreasing order. (In words: \rho\prec\rho' means \rho is more mixed than \rho'.) Then the following conditions are equivalent:None of this depends on the dynamics being Lindbladian. If you drop the first condition and drop the “t” subscript, so that \Phi is just some arbitrary (potentially non-divisible) CP map, the remaining conditions are all equivalent.a  

  • \mathcal{L}[I]=0
  • \Phi_t[I]=I: “\Phi_t is a unital map (for all t)”
  • \frac{\mathrm{d}}{\mathrm{d}t}S_\alpha[\Phi_t[\rho]] \ge 0 for all \rho, t, and \alpha: “All Renyi entropies are non-decreasing”
  • \Phi_t[\rho]\prec\rho for all t: “\Phi_t is mixedness non-decreasing”
  • \Phi_t[\rho] = \sum_j p_j U^{(t)}_j\rho U^{(t)\dagger}_j for all t and some unitaries U^{(t)}_j and probabilities p_j.

The non-trivial equivalences above are proved in Sec. 8.3 of Wolf, “Quantum Channels and Operations Guided Tour“.See also “On the universal constraints for relaxation rates for quantum dynamical semigroup” by Chruscinski et al [2011.10159] for further interesting discussion.b  

Note that having all Hermitian Lindblad operators (L_k = L_k^\dagger) implies, but is not implied by, the above conditions. Indeed, the condition of Lindblad operator Hermiticity (or, more generally, normality) is not preserved under the unitary gauge freedom L_k\to L_k^\prime = \sum_j u_{kj} L_j (which leaves the Lindbladian \mathcal{L} invariant for unitary u.)… [continue reading]

AI goalpost moving is not unreasonable

[Summary: Constantly evolving tests for what counts as worryingly powerful AI is mostly a consequence of how hard it is to design tests that will identify the real-world power of future automated systems. I argue that Alan Turing in 1950 could not reliably distinguish a typical human from an appropriately-fine-tuned GPT-4, yet all our current automated systems cannot produce growth above historic trends.A draft of this a  ]

What does the phenomena of “moving the goalposts” for what counts as AI tell us about AI?

It’s often said that people repeatedly revising their definition of AI, often in response to previous AI tests being passed, is evidence that people are denying/afraid of reality, and want to put their head in the sand or whatever. There’s some truth to that, but that’s a comment about humans and I think it’s overstated.

Closer to what I want to talk about is the idea AI is continuously redefined to mean “whatever humans can do that hasn’t been automated yet”, often taken to be evidence that AI is not a “natural” kind out there in the world, but rather just a category relative to current tech. There’s also truth to this, but not exactly what I’m interested in.

To me, it is startling that (I claim) we have systems today that would likely pass the Turing test if administered by Alan Turing, but that have negligible impact on a global scale. More specifically, consider fine-tuning GPT-4 to mimic a typical human who lacks encyclopedic knowledge of the contents of the internet. Suppose that it’s mimicking a human with average intelligence whose occupation has no overlap with Alan Turing’s expertise.… [continue reading]

Notable reviews of arguments for AGI ruin

Here’s a collection of reviews of the arguments that artificial general intelligence represents an existential risk to humanity. They vary greatly in length and style. I may update this from time to time.

  • Here, from my perspective, are some different true things that could be said, to contradict various false things that various different people seem to believe, about why AGI would be survivable on anything remotely resembling the current pathway, or any other pathway we can easily jump to.
  • This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire -- especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans. Second, I formulate and evaluate a more specific six-premise argument that creating agents of this kind will lead to existential catastrophe by 2070. On this argument, by 2070: (1) it will become possible and financially feasible to build relevantly powerful and agentic AI systems; (2) there will be strong incentives to do so; (3) it will be much harder to build aligned (and relevantly powerful/agentic) AI systems than to build misaligned (and relevantly powerful/agentic) AI systems that are still superficially attractive to deploy; (4) some such misaligned systems will seek power over humans in high-impact ways; (5) this problem will scale to the full disempowerment of humanity; and (6) such disempowerment will constitute an existential catastrophe.
[continue reading]

Table of proposed macroscopic superpositions

Here is a table of proposals for creating enormous superpositions of matter. Importantly, all of them describe superpositions whose spatial extent is comparable to or larger than the size of the object itself. Many are quite speculative. I’d like to keep this table updated, so send me references if you think they should be included.

radius (nm)
size (nm)
rate (Hz)
KDTL[1-3]OligoporphyrinTo achieve their highest masses, the KDTL interferometer has superposed molecules of functionalized oligoporphyrin, a family of organic molecules composed of C, H, F, N, S, and Zn with molecular weights ranging from ~19,000 Da to ~29,000 Da. (The units here are Daltons, also known as atomic mass units (amu), i.e., the number of protons and neutrons.) The distribution is peaked around 27,000 Da.a  ,00∼1.02.7 × 104100,266100,001.2410,000.00
OTIMA[4-6]Gold (Au),0005.06.0 × 106100,079100,094.0010,600.00
Bateman et al.[7]Silicon (Si),0005.51.1 × 106100,150100,140.0010,000.50
Geraci et al.[8]Silica (SiO2),0006.51.6 × 106100,250100,250.0010,000.50
Wan et al.[9]Diamond (C),0095.07.5 × 109100,100100,000.0510,001.00
MAQRO[10-13]Silica (SiO2),0120.01.0 × 101000,100100,000.0010,000.01
Pino et al.[14]Niobium (Nb)1,000.02.2 × 101300,290100,450.0010,000.10
Stickler et al.
[continue reading]

GPT-3, PaLM, and look-up tables

[This topic is way outside my expertise. Just thinking out loud.]

Here is Google’s new language model PaLM having a think:

Alex Tabarrok writes

It seems obvious that the computer is reasoning. It certainly isn’t simply remembering. It is reasoning and at a pretty high level! To say that the computer doesn’t “understand” seems little better than a statement of religious faith or speciesism…

It’s true that AI is just a set of electronic neurons none of which “understand” but my neurons don’t understand anything either. It’s the system that understands. The Chinese room understands in any objective evaluation and the fact that it fails on some subjective impression of what it is or isn’t like to be an AI or a person is a failure of imagination not an argument…

These arguments aren’t new but Searle’s thought experiment was first posed at a time when the output from AI looked stilted, limited, mechanical. It was easy to imagine that there was a difference in kind. Now the output from AI looks fluid, general, human. It’s harder to imagine there is a difference in kind.

Tabarrok uses an illustration of Searle’s Chinese room featuring a giant look-up table:

But as Scott Aaronson has emphasized [PDF], a machine that simply maps inputs to outputs by consulting a giant look-up table should not be considered “thinking” (although it could be considered to “know”). First, such a look-up table would be beyond astronomically large for any interesting AI task and hence physically infeasible to implement in the real universe. But more importantly, the fact that something is being looked up rather than computed undermines the idea that the system understands or is reasoning.… [continue reading]

Lindblad operator trace is 1st-order contribution to Hamiltonian part of reduced dynamics

In many derivations of the Lindblad equation, the authors say something like “There is a gauge freedomA gauge freedom of the Lindblad equation means a transformation we can to both the Lindblad operators and (possibly) the system’s self-Hamiltonian, without changing the reduced dynamics.a   in our choice of Lindblad (“jump”) operators that we can use to make those operators traceless for convenience”. However, the nature of this freedom and convenience is often obscure to non-experts.

While reading Hayden & Sorce’s nice recent paper [arXiv:2108.08316] motivating the choice of traceless Lindblad operators, I noticed for the first time that the trace-ful parts of Lindblad operators are just the contributions to Hamiltonian part of the reduced dynamics that arise at first order in the system-environment interaction. In contrast, the so-called “Lamb shift” Hamiltonian is second order.

Consider a system-environment decomposition \mathcal{S}\otimes \mathcal{E} of Hilbert space with a global Hamiltonian H = H_S + H_{I} + H_E, where H_S = H_S \otimes I_\mathcal{E}, H_E = I_\mathcal{S}\otimes H_E, and H_I = \epsilon \sum_\alpha A_\alpha \otimes B_\alpha are the system’s self Hamiltonian, the environment’s self-Hamiltonian, and the interaction, respectively. Here, we have (without loss of generality) decomposed the interaction Hamiltonian into a tensor product of Hilbert-Schmidt-orthogonal sets of operators \{A_\alpha\} and \{B_\alpha\}, with \epsilon a real parameter that control the strength of the interaction.

This Hamiltonian decomposition is not unique in the sense that we can alwaysThere is also a similar freedom with the environment in the sense that we can send H_E \to H_E + \Delta H_E and \epsilon H_I \to \epsilon H_I - \Delta H_E.b   send H_S \to H_S + \Delta H_S and H_I \to H_I - \Delta H_S, where \Delta H_S = \Delta H_S \otimes I_\mathcal{E} is any Hermitian operator acting only on the system. When reading popular derivations of the Lindblad equation

(1)   \begin{align*} \partial_t \rho_{\mathcal{S}} = -i[\tilde{H}_{\mathcal{S}}, \rho_{\mathcal{S}}] + \sum_i\left[L_i \rho_{\mathcal{S}} L_i^\dagger - (L_i^\dagger L_i \rho_{\mathcal{S}} + \rho_{\mathcal{S}} L_i^\dagger L_i)/2\right] \end{align*}

like in the textbook by Breuer & Petruccione, one could be forgivenSpecifically, I have forgiven myself for doing this…c   for thinking that this freedom is eliminated by the necessity of satisfying the assumption that \mathrm{Tr}_\mathcal{E}[H_I(t),\rho(0)]=0, which is crucially deployed in the “microscopic” derivation of the Lindblad equation operators \tilde{H}_{\mathcal{S}} and \{L_i\} from the global dynamics generated by H.… [continue reading]