[This post describes ideas generated in discussion with Markus Hauru, Curt von Keyserlingk, and Daniel Ranard.
An original dream of defining branches based on redundant records (aka redundant classical information, aka GHZ-like correlations) was that it would be possible to decompose the wavefunction of an evolving non-integrable quantum system at each point in time into macroscopically distinguishable branches that individually had bounded amounts of long-range entanglement (i.e., could be efficiently expressed as a matrix product state) even though the amount of long-range entanglement for the overall state diverges in time. If one could numerically perform such a decomposition, and if the branches only “fine-grain in time”, then one could classically sample from the branches to accurately estimate local observables even if the number of branches increases exponentially in time (which we expect them to do).
However, we now think that only a fairly small fraction of all long range entanglement can be attributed to redundantly recorded branches. Thus, even if we found and efficiently handled all such classical information using a decomposition into a number of branches that was increasing exponentially in time (polynomial branch entropy), most branches would nevertheless still have an entanglement entropy across any spatial partition that grew ~linearly in time (i.e., exponentially increasing bond dimension in the MPS representation) until saturating.
In this post I’ll first write down a simple model that suggests the need to generalize the idea of branches in order to account for most long-range entanglement. Then I will give some related reasons to think that this generalized structure will take the form not of a preferred basis, but rather preferred subspaces and subsystems, and together these will combine into a preferred “branch algebra”.… [continue reading]
[This post describes ideas generated in discussion with Markus Hauru, Curt von Keyserlingk, and Daniel Ranard.
Taylor & McCulloch have a tantalizing paper about which I’ll have much to say in the future. However, for now I want to discuss the idea of the “compatibility” of branch decompositions, which is raised in their appendix. In particular, the differences between their approach and mine prompted me to think more about how we could narrow down on what sorts of logicala axioms for branches could be identified even before we pin down a physical definition. Indeed, as I will discuss below, the desire for compatibility raises the hope that some natural axioms for branches might enable the construction of a preferred decomposition of the Hilbert space into branching subspaces, and that this might be done independently of the particular overall wavefunction. However, the axioms that I write down prove to be insufficient for this task.
Logical branch axioms
Suppose we have a binary relation “” on the vectors in a (finite-dimensional) Hilbert space that indicates that two vectors (states), when superposed, should be considered to live on distinct branches. I will adopt the convention that “” is interpreted to assert that and that the branch relation holds. … [continue reading]
This post explains the relationship between (objective) collapse theories and theories of wavefunction branching. The formalizations are mathematically very simple, but it’s surprisingly easily to get confused about observational consequences unless laid out explicitly.
(Below I work in the non-relativistic case.)
Branching: An augmentation
In its most general form, a branching theory is some time-dependent orthogonala decomposition of the wavefunction: where is some time-dependent set of orthogonal vectors. I’ve expressed this in the Heisenberg picture, but the Schrödinger picture wavefunction and branches are obtained in the usual (non-branch-dependent) way by evolution with the overall unitary: and .
We generally expect the branches to fine-grain in time. That is, for any two times and , it must be possible to partition the branches at the later time into subsets of child branches, each labeled by a parent branch at the earlier time, so that each subset of children sums up to its corresponding earlier-time parent: for all where and for . By the orthogonality, a child will be a member of the subset corresponding to a parent if and only if the overlap of the child and the parent is non-zero. In other words, a branching theory fine-grains in time if the elements of and are formed by taking partitions and of the same set of orthogonal vectors, where is a refinement of , and vector-summing each subset of the respective partition.… [continue reading]
Harold Ollivier has put out a nice paper generalizing my best result:
We examine the emergence of objectivity for quantum many-body systems in a setting without an environment to decohere the system’s state, but where observers can only access small fragments of the whole system. We extend the result of Reidel (2017) to the case where the system is in a mixed state, measurements are performed through POVMs, and imprints of the outcomes are imperfect. We introduce a new condition on states and measurements to recover full classicality for any number of observers. We further show that evolutions of quantum many-body systems can be expected to yield states that satisfy this condition whenever the corresponding measurement outcomes are redundant.
Ollivier does a good job of summarizing why there is an urgent need to find a way to identify objectively classical variables in a many-body system without leaning on a preferred system-environment tensor decomposition. He also concisely describes the main results of my paper in somewhat different language, so some of you may find his version nicer to read. … [continue reading]
Although I’ve been repeatedly advised it’s not a good social strategy, a glorious way to start a research paper is with specific, righteous criticism of your anonymous colleagues:a
Transformers are deep feed-forward artificial neural networks with a (self)attention mechanism. They have been tremendously successful in natural language processing tasks and other domains. Since their inception 5 years ago, many variants have been suggested. Descriptions are usually graphical, verbal, partial, or incremental. Despite their popularity, it seems no pseudocode has ever been published for any variant. Contrast this to other fields of computer science, even to “cousin” discipline reinforcement learning.
So begin Phuong & Hutter in a great, rant-filled paper that “covers what Transformers are, how they are trained, what they’re used for, their key architectural components, tokenization, and a preview of practical considerations, and the most prominent models.” As an exercise, in this post I’m dig into the first item by writing down an even more compact definition of a transformer than theirs, in the form of a mathematical function rather than pseudocode, while avoiding the ambiguities rampant in the rest of the literature. I will consider only what a single forward-pass of a transformer does, considered as a map from token sequences to probability distributions over the token vocabulary. I do not try to explain the transformer, nor do I address other important aspects like motivation, training, and computational.
(This post also draws on a nice introduction by Turner. If you are interested in understanding and interpretation, you might check out — in descending order of sophistication — Elhage et al.… [continue reading]
After years of not having an intuitive interpretation of the unital condition on CP maps, I recently learned a beautiful one: unitality means the dynamics never decreases the state’s mixedness, in the sense of the majorization partial order.
Consider the Lindblad dynamics generated by a set of Lindblad operators , corresponding to the Lindbladian
and the resulting quantum dynamical semigroup . Let
be the Renyi entropies, with the von Neumann entropy. Finally, let denote the majorization partial order on density matrices: exactly when exactly when for all , where and are the respective eigenvalues in decreasing order. (In words: means is more mixed than .) Then the following conditions are equivalent:a
- : “ is a unital map (for all )”
- for all , , and : “All Renyi entropies are non-decreasing”
- for all : “ is mixedness non-decreasing”
- for all and some unitaries and probabilities .
The non-trivial equivalences above are proved in Sec. 8.3 of Wolf, “Quantum Channels and Operations Guided Tour“.b
Note that having all Hermitian Lindblad operators () implies, but is not implied by, the above conditions. Indeed, the condition of Lindblad operator Hermiticity (or, more generally, normality) is not preserved under the unitary gauge freedom (which leaves the Lindbladian invariant for unitary .)… [continue reading]
[Summary: Constantly evolving tests for what counts as worryingly powerful AI is mostly a consequence of how hard it is to design tests that will identify the real-world power of future automated systems. I argue that Alan Turing in 1950 could not reliably distinguish a typical human from an appropriately-fine-tuned GPT-4, yet all our current automated systems cannot produce growth above historic trends.a
What does the phenomena of “moving the goalposts” for what counts as AI tell us about AI?
It’s often said that people repeatedly revising their definition of AI, often in response to previous AI tests being passed, is evidence that people are denying/afraid of reality, and want to put their head in the sand or whatever. There’s some truth to that, but that’s a comment about humans and I think it’s overstated.
Closer to what I want to talk about is the idea AI is continuously redefined to mean “whatever humans can do that hasn’t been automated yet”, often taken to be evidence that AI is not a “natural” kind out there in the world, but rather just a category relative to current tech. There’s also truth to this, but not exactly what I’m interested in.
To me, it is startling that (I claim) we have systems today that would likely pass the Turing test if administered by Alan Turing, but that have negligible impact on a global scale. More specifically, consider fine-tuning GPT-4 to mimic a typical human who lacks encyclopedic knowledge of the contents of the internet. Suppose that it’s mimicking a human with average intelligence whose occupation has no overlap with Alan Turing’s expertise.… [continue reading]
Here’s a collection of reviews of the arguments that artificial general intelligence represents an existential risk to humanity. They vary greatly in length and style. I may update this from time to time.
Here, from my perspective, are some different true things that could be said, to contradict various false things that various different people seem to believe, about why AGI would be survivable on anything remotely resembling the current pathway, or any other pathway we can easily jump to.
This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire -- especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans. Second, I formulate and evaluate a more specific six-premise argument that creating agents of this kind will lead to existential catastrophe by 2070. On this argument, by 2070: (1) it will become possible and financially feasible to build relevantly powerful and agentic AI systems; (2) there will be strong incentives to do so; (3) it will be much harder to build aligned (and relevantly powerful/agentic) AI systems than to build misaligned (and relevantly powerful/agentic) AI systems that are still superficially attractive to deploy; (4) some such misaligned systems will seek power over humans in high-impact ways; (5) this problem will scale to the full disempowerment of humanity; and (6) such disempowerment will constitute an existential catastrophe.
… [continue reading]