(1)
where and are coherent states, is the mean phase space position of the two states, “” denotes the convolution, and is the (Gaussian) quasicharacteristic function of the ground state of the Harmonic oscillator.
The quasicharacteristic function for a quantum state of a single degree of freedom is defined as
where is the Weyl phase-space displacement operator, are coordinates on “reciprocal” (i.e., Fourier transformed) phase space, is the phase-space location operator, and are the position and momentum operators, “” denotes the Hilbert-Schmidt inner product on operators, , and “” denotes the symplectic form, . (Throughout this post I use the notation established in Sec. 2 of my recent paper with Felipe Hernández.) It has variously been called the quantum characteristic function, the chord function, the Wigner characteristic function, the Weyl function, and the moment-generating function. It is the quantum analog of the classical characteristic function.
Importantly, the quasicharacteristic function obeys and , just like the classical characteristic function, and provides a definition of the Wigner function where the linear symplectic symmetry of phase space is manifest:
(2)
where is the phase-space coordinate and is the position-space representation of the quantum state. This first line says that and are related by the symplectic Fourier transform. (This just means the inner product “” in the regular Fourier transform is replaced with the symplectic form, and has the simple effect of exchanging the reciprocal variables, , simplifying many expressions.) The second line is often taken as the definition of the Wigner function, but it suffers from explicitly breaking symmetry in phase space, unnecessarily privileging position over momentum. The above relations make it clear that is yet another 1-to-1 representation of a quantum state.
First, we will need these checkable properties of the displacement operator,
(3)
from which we can invert the definition of the quasicharacteristic function:
(4)
Next, take to be an arbitrary normalized pure wavefunction (i.e., ) that will serve as a “reference wavepacket”. This is typically taken to be a wavepacket with minimal amounts of momentum well localized around the origin in configuration space, that is, a state whose Wigner function is mostly concentrated around the origin in phase space. Then we define to be the reference wavepacket displaced in phase space by the vector . We call the set the “wavepacket basis”; it forms an overcomplete basis (formally, a frame) of the Hilbert space, in particular providing a resolution of the identity . For concreteness, you can if you like take to be the ground state of the Harmonic oscillator, i.e., a Gaussian with zero expectation of position and momentum: for some characteristic spatial scale ; this makes the set of coherent states.
Now we consider the matrix elements in the wavepacket basis. Unlike for an orthonormal basis, there is no sharp distinction between off-diagonal and on-diagonal matrix elements. Rather, can be considered roughly off-diagonal whenever and are sufficiently far apart^{a } that . Large off-diagonal terms are indicative of long-range coherence in phase space, where “large” is relative how closely it saturates the Cauchy-Schwartz inequality
(5)
For instance, if is the coherence superposition^{b } of two widely separated wavepackets and , and if is the corresponding incoherent mixture, then but , with all on-diagonal elements the same: .
We can then compute
(6)
and using the shorthands and for the phase-space mean and separation between the two wavepackets and , we have
(7)
where is the quasicharacteristic function of the reference wavepacket . Several things could be said about this expression, especially if we introduced the twisted convolution, but let’s just observe that if for outside some region in reciprocal phase space then only “knows” about when is in that region translated so it’s centered around . Furthermore, it only knows about the part proportional to the local Fourier component . In particular, if our reference wavepacket is Gaussian, , then , so that is essentially determined by the values takes in a -sized region centered around .
From this one can also quickly check that if we take the squared norm of this off-diagonal matrix element and integrate over the entire phase space with a fixed value of the separation between the two points and , we get
(8)
where “” denote the convolution. So we find that the “total amount of coherence” over the phase-space distance (i.e., the summed amount of coherence between all pairs of wavepackets separated by ) is encoded in the value of in a small -sized region around . In the aforementioned Gaussian case, we have .
(↵ returns to text)
Although I’m not sure they would phrase it this way, the key idea for me was that merely protecting massive superpositions from decoherence is actually not that hard; sufficient isolation can be achieved in lots of systems. Rather, much like quantum computing, the challenge is to achieve this level of protection while simultaneously having sufficient control to create and measure superpositions.
Carney et al. observe that you do not need to be able to implement a Hadamard-like gate (i.e., a gate that takes a state in the preferred quasi-classical basis^{a } to superpositions thereof) on the massive system in order to demonstrate that it’s storing quantum information. You just need to be able to implement a controlled unitary on it, in any basis, that is controlled by a second (smaller) quantum system that you do have more complete control over. More specifically, they suggest starting with the control system in a superposition of two near “gravitational eigenstates” and , allowing this state to become entangled with an initial state of a massive oscillator (decohering both the oscillator and the control system), and then witnessing recoherence (revival) of the control system as the oscillator disentangles into a final state . (By “gravitational eigenstates”, I just mean states of the oscillator that, to a good approximation, source a quasiclassical state of gravitational field; in this case, it’s something like a wavepacket that’s well localized in space, rather than being in a superposition of widely separated positions that would have distinctly different corresponding gravitational fields.) For this, all that needs to be achieved is evolution of the form
(1)
where (or at least is less than for partial decoherence). Importantly, at no time does the massive oscillator need to be brought into a coherent superposition like of gravitational eigenstates. Furthermore, it doesn’t even matter whether and are the same. If you can implement this evolution and witness the revival, and you can convince yourself that the control system couldn’t have been entangling with anything else, then you have shown that the gravitational field is transmitting quantum information.
The general idea of leveraging quantum control of a small system to gain partial quantum control of a large system isn’t itself a novel idea, but the authors go on to show that
The first part is unintuitive, but you can basically read it off from Eq. (1): the decoherence and recoherence can still happen even if the massive oscillator starts and ends in mixtures of states and , just as long as the disentangling happens at the same moment in time (to very high accuracy) for all members of the ensemble. (That’s where the strong harmonicity assumption comes in.) Furthermore, the initial and final ensembles don’t have to be the same. In particular, the contraction of the state of the oscillator by, say, one quanta over the course of a single period (e.g., from to ) doesn’t prevent you from having a pretty clean revival. (The visibility will only go down insofar as the contraction is so strong that the oscillator states are piling up on top of each other near the ground state, preventing full disentanglement.)
Instead, the only thing you’re really worrying about is isolation, i.e., the extent to which you can prevent the two paths of the oscillator (each conditional on different states and of the control system) from getting decohered by the larger environment.
Here are some concerns that I haven’t fleshed out yet:
Naturally, I was very interested to know whether I could shoe-horn this idea into my hobby horse: decoherence detection. Unfortunately, it looks on first blush like the ideas can’t be combined. Here’s why.
The standard QBM parameters used by Carney et al. for the open-system dynamics of the oscillator is the mean excitation number (were the oscillator allowed to thermalized to the bath) and the dissipation coefficient . My preferred parameters are the decoherence-and-diffusion matrix and (described in detail here), and they are related by (where my parameters are more general in the sense that they allow for the decoherence-and-diffusion matrix to be not proportional to the identity ).
Anomalous pure^{d } decoherence (e.g, from collisional decoherence like dark matter (DM), or from objective collapse models like Diosi-Penrose) is the case of the simultaneous limits , while holding constant. This is the sense in which idealized collisional decoherence looks like an infinite temperature bath, and it’s natural model to consider for DM because 1 MeV virialized DM is at ~6000 Kelvin. (Once the DM mass is below 10 keV, then the infinite-temp approximation breaks down. Also, for usual collisional decoherence, is not actually proportional to the identity, but I don’t think it matters much for what I’m going to say…) To get the complete reduced dynamics for the oscillator, you would basically just add this pure decoherence to the other conventional sources of noise (which generally are dissipative, ).
This means when you have an oscillator with conventional sources of noise, and you add anomalous decoherence, you expect to raise the equilibrium temperature of the oscillator, and hence raise the thermalized occupation number , but you do not change or . Generally the experimentally measurable quantities are and , and the bare diffusion matrix for conventional sources alone is inaccessible.
Unfortunately, the protocol of Carney et al. doesn’t really change this. All it can detect is the total strength of . You could try and distinguish anomalous decoherence from conventional sources during the protocol by using the various tricks I’ve talked about (shielding the experiment from DM, looking for sidereal variations, etc.), but it would be a hell of a lot easier to just use those tricks while simply measuring the equilibrium temperature of the oscillator — no quantum mechanics required.^{e }
This also fits with my interpretation of superpositions as “negative temperature detectors” (see first figure in this blog post). Superpositions are a useful way to get increased sensitivity when you’ve already maxed out the amount of sensitivity you can get from cooling your target (because you’ve hit the ground state). But the whole point of the Carney et al. protocol is that is doesn’t care what the temperature of the massive oscillator is.
A bit disappointing, but I will keep thinking about variations on this…
(↵ returns to text)
After the intro, the authors give self-contained background information on the two key prerequisites: quantum Darwinism and generalized probabilistic theories (GPTs). The former is an admirable brief summary of what are, to me, the core and extremely simple features of quantum Darwinism. This and the summary of GPT can be skipped by familiar readers, but I recommend reading definitions 1-4 in the latter part of the GPT subsection, plus the “Summary of Assumptions”.
The main results seems to be
..one needs to consider the possibility of [classical-information-spreading dynamic] that preserves the statistics of [measurements on the system] S but still changes the state of S, even if S is prepared in one of the [(generalized) pointer states]. This is impossible in quantum theory…However, many GPT systems (such as gbits [18]) violate the analogous operational condition…Thus, definition 5 captures the essential features for ideal Darwinism on the operational level, while definition 6 further requests classical features from the frame states themselves.
In [quantum theory], the fan-out gate [classical-information-spreading dynamic] can create entanglement whenever the system is not initialized to a pointer state….entanglement–creation is a necessary property of any generalized ideal Darwinism process.
The reversible qualifier is key here, as this statement excludes the possibility of Darwinism in classical models, whereas we of course know it’s possible to copy classical information in classical models with irreversible Markovian classical dynamics. Thus
In particular, this rules out Darwinism in boxworld [18] (a theory containing the aforementioned gbits) or any dichotomic maximally nonlocal theory. For these specific examples, one could also infer this from Refs. [40, 44], but here we have shown it without having to determine the complete structure of the reversible transformations.
Philosophically and aesthetically I like the idea of GPT as an operational framework for thinking about the foundations of quantum mechanics, although we should all be quite skeptical that GPTs besides quantum theory — either more or less expansive — will be found to describe any fundamental physics (even though think it’s quite plausibly that quantum theory will eventually be superseded by something). This is because, among other things, GPTs treat space and time very different and especially because they take time asymmetry as fundamental rather than emergent or a consequence of initial conditions.
The practical downside of GPTs is that there’s been a whole industry of papers exploring non-classical-or-quantum GPTs that don’t maintain contact with what could describe the real world; it’s too much fun to play with the math. This paper was a welcome exception, as it helps clarify which theories could lead to the appearance of classicality, at least insofar as the latter is identified with quantum Darwinsism.
]]>It seems clear enough to me that, within the field of journalism, the distinction between opinion pieces and “straight reporting” is both meaningful and valuable to draw. Both sorts of works should be pursued vigorously, even by the same journalists at the same time, but they should be distinguished (e.g., by being placed in different sections of a newspaper, or being explicitly labeled “opinion”, etc.) and held to different standards.^{a } This is true even though there is of course a continuum between these categories, and it’s infeasible to precisely quantify the axis. (That said, I’d like to see more serious philosophical attempts to identify actionable principles for drawing this distinction more reliably and transparently.)
It’s easy for idealistic outsiders to get the impression that all of respectable scientific research is analogous to straight reporting rather than opinion, but just about any researcher will tell you that some articles are closer than other articles to the opinion category; that’s not to say it’s bad or unscientific, just that such articles go further in the direction of speculative interpretation and selective highlighting of certain pieces of evidence, and are often motivated by normative claims (“this area is more fruitful research avenue than my colleagues believe”, “this evidence implies the government should adopt a certain policy”, etc.). Let’s call this “scientific opinion”, as opposed to “straight research”, even though we again concede that the distinction is fuzzy.
For the most part, there isn’t a distinction in scientific journal between these types of articles (with some notable exceptions, especially in certain splashy scientific magazines that have articles that resemble newspaper columns). Rather, editors and referees seem to enforce a certain level of objectivity for all articles, and this usually takes the form of (1) an objective sounding voice and (2) outright rejection of articles that are too opinion-y. I mostly think this works fine, although the clear need for some scientific opinion writing means that (a) editors and referees allow some opinion-y type stuff without attempts to explicitly distinguish it from straight research and (b) some scientific opinion discussion that is quite influential ends up getting unfortunately relegated to blogs and in-person discussion.
In recent news, a prominent research scientist got fired from an industry lab for writing a scientific article that made her employer look bad. Lots of folks reasonably (though not necessarily persuasively) argue either that (i) corporations should always support their internal research even when it makes them look bad or that (ii) it’s naive to think that corporation could allow this, so attempt to force them to will just induce them to do less research in the future.
I wonder if constructive progress could be made by relying on the distinction between straight research and scientific opinion. My second-hand impression of the article prompting the current dispute is that it was less a follow-your-nose, just-the-facts-ma’am investigation, and more a scientific opinion piece. I have not read the actual work, but for the purposes of this post it actually doesn’t matter because I just want to consider the general possibility of using this distinction. So for the sake of argument, let’s consider a hypothetical situation where an industry researcher wants to publish an article that makes her employer look bad and the article is well characterized as scientific opinion, i.e., is scientifically sound (no falsehoods, rigorous, etc.) but also clearly displaying aspects of opinion (obviously motivated by normative claims rather than just scientific curiosity, emphasizes evidence on one side, and somewhat speculative).
In journalism, most people agree that opinion pieces are very useful and should exist, but also that employees can’t write anti-employer editorials in the New York Times while reasonably demanding that their employer keep paying them. (A few folks will reject this latter claim, but I think they can only reasonably do so by endorsing a very strong version of the principle of freedom of speech, going beyond even John Stuart Mill.) It seems then that you might be able to protect an important subset of research freedom in industrial labs by establishing a norm that straight research should be shielded from being vetoed from above, while corporate leaders are permitted to exercise control on scientific opinion. If this could be achieved, outsiders could naturally put a bit more faith in straight research from industry while remaining appropriately skeptical of scientific opinion pieces that they generate.
Of course, if the decision about whether any given article was an opinion piece were decided in-house by the company, it would be very hard to trust, and also very hard to make transparent without destroying the company’s ability to meaningfully protect itself from bad PR. Therefore, you could consider an outside arbiter (e.g., certain independent scientific journals) who would certify articles as straight research through a review process that uses publicly-stated principles but that, necessarily, did not release the research article publicly until after it had been certified in this way. (This is not unusually secretive; journal already reject articles without publicly explaining why.) If companies were convinced that this process was at least sort of reliable and objective, they might publicly commit to never vetoing straight research, as certified by the outsider arbiter, both as a way to establish trust in the public and as a way to attract research talent by credibly committing to a limited form of academic freedom.
If you think that industry research currently enjoys, or could plausibly achieve, university levels of freedom when it comes to any results that make the employer look bad, then you would of course see this proposal as a step backwards. However, it seems clear to me that corporate leaders have always had a veto on sufficiently unpalatable research results, and that if they were somehow forced to relinquish this power (due to public opinion), the net result would simply be them choosing to not fund research that had a chance of turning out badly for them. Rather, the idea considered here is just to somewhat constrain the leadership veto and, importantly, making it more principled and transparent.
[Added:] Of course, this proposal doesn’t fix all conflicts between research and academic freedom. Some straight research will make a company look clearly bad without any need for speculation or unusual emphasis. But I think this proposal is plausibly an improvement over the status quo because (1) opinion is more likely to generate bad PR but (2) straight research is “less replaceable” in the sense that outsiders will have an easier time writing critical opinion pieces if they have access to straight research from insiders. [I thank Dylan Hadfield-Menell and Graeme Smith for conversation that informed this post.](↵ returns to text)
(1)
where the arrows over partial derivatives tell you which way they act, i.e., . This only becomes slightly less weird when you use the equivalent formula , where “” is the Moyal star product given by
(2)
The star product has the crucial feature that , where we use a hat to denote the Weyl transform (i.e., the inverse of the Wigner transform taking density matrices to Wigner functions), which takes a scalar function over phase-space to an operator over our Hilbert space. The star product also has some nice integral representations, which can be found in books like Curtright, Fairlie, & Zachos^{a }, but none of them help me understand the Moyal equation.
A key problem is that both of these expressions are neglecting the (affine) symplectic symmetry of phase space and the dynamical equations. Although I wouldn’t call it beautiful, we can re-write the star product as
(3)
where is a symplectic index using the Einstein summation convention, and where symplectic indices are raised and lowered using the symplectic form just as for Weyl spinors: and , where is the antisymmetric symplectic form with , and where upper (lower) indices denote symplectic vectors (co-vectors).
With this, we can expand the Moyal equation as
where we can see in hideous explicitness that it’s a series in the even powers of and the odd derivates of the Hamiltonian and the Wigner function . Furthermore, as we see it quickly reduces to the Poisson bracket .
]]>Advanced quantum computing comes with some new applications as well as a few risks, most notably threatening the foundations of modern online security.
In light of the recent experimental crossing of the “quantum supremacy” milestone, it is of great interest to estimate when devices capable of attacking typical encrypted communication will be constructed, and whether the development of communication protocols that are secure against quantum computers is progressing at an adequate pace.
Beyond its intrinsic interest, quantum computing is also fertile ground for quantified forecasting. Exercises on forecasting technological progress have generally been sparse — with some notable exceptions — but it is of great importance: technological progress dictates a large part of human progress.
To date, most systematic predictions about development timelines for quantum computing have been based on expert surveys, in part because quantitative data about realistic architectures has been limited to a small number of idiosyncratic prototypes. However, in the last few years the number of device has been rapidly increasing and it is now possible to squint through the fog of research and make some tentative extrapolations. We emphasize that our quantitative model should be considered to at most augment, not replace, expert predictions. Indeed, as we discuss in our preprint, this early data is noisy, and we necessarily must make strong assumptions to say anything concrete.
Our first step was to compile an imperfect dataset of quantum computing devices developed so far, spanning 2003-2020. This dataset is freely available, and we encourage others to build on it in the future as the data continues to roll in.
To quantify progress, we developed our own index – the generalized logical qubit – that combines two important metrics of performance for quantum computers: the number of physical qubits and the error rate for two-qubit gates. Roughly speaking, the generalized logical qubit is the number of noiseless qubits that could be simulated with quantum error correction using a given number of physical qubits with a given gate error. Importantly, the metric can be extended to fractional values, allowing us to consider contemporary systems that are unable to simulate even a single qubit noiselessly.
To forecast historical progress into the future, we focus on superconducting-qubit devices. We make our key assumption of exponential progress on the two main metrics and use statistical bootstrapping to build confidence intervals around when our index metric will cross the frontier where it will be able to threaten the popular cryptographic protocol RSA 2048.
Note that since we are modelling progress on each metric separately we are ignoring the interplay between the two metrics. But a simple statistical check shows that the metrics are likely negatively correlated within each system – that is, quantum computer designers face a trade off between increasing the number of qubits and the gate quality. Ignoring this coupling between the metrics results in an optimistic model.
As a point of comparison, last year Piani and Mosca surveyed quantum computing experts and found that 22.7% think it is likely or highly likely that quantum computers will be able to crack RSA-2048 keys by 2030, and 50% think that is likely or highly likely that we will be able to crack RSA-2048 keys by 2035.
Will this be enough time to deploy adequate countermeasures? I discuss this in depth in this other article about quantum cryptanalysis. Given the current rate of progress on the standardization of quantum-resistant cryptography there seems to be little reason for concern (though it should be considered that the yearly base rate for discontinuous breakthroughs for any given technology is about 0.1%).
If you are interested in better understanding our model and assumptions, I encourage you to check out our preprint on the arXiv.
]]>Here is the abstract:
There is a vast number of people who will live in the centuries and millennia to come. In all probability, future generations will outnumber us by thousands or millions to one; of all the people who we might affect with our actions, the overwhelming majority are yet to come. In the aggregate, their interests matter enormously. So anything we can do to steer the future of civilization onto a better trajectory, making the world a better place for those generations who are still to come, is of tremendous moral importance. Political science tells us that the practices of most governments are at stark odds with longtermism. In addition to the ordinary causes of human short-termism, which are substantial, politics brings unique challenges of coordination, polarization, short-term institutional incentives, and more. Despite the relatively grim picture of political time horizons offered by political science, the problems of political short-termism are neither necessary nor inevitable. In principle, the State could serve as a powerful tool for positively shaping the long-term future. In this chapter, we make some suggestions about how we should best undertake this project. We begin by explaining the root causes of political short-termism. Then, we propose and defend four institutional reforms that we think would be promising ways to increase the time horizons of governments: 1) government research institutions and archivists; 2) posterity impact assessments; 3) futures assemblies; and 4) legislative houses for future generations. We conclude with five additional reforms that are promising but require further research. To fully resolve the problem of political short-termism we must develop a comprehensive research program on effective longtermist political institutions.
In the rest of the post, I am going to ask a few pointed questions and make comments. Fair warning: I am trying to get back into frequent low-overhead blogging, so this post is less polished by design, and won’t be very useful if you don’t read the paper (since I don’t summarize it). My comments are largely critical, but needless to say I usually only bother to comment on the tiny minority of papers that I think are important and interesting, which this certainly is.
I know this is just an early attempt at formalizing these ideas, but I would want to see substantially more discussion of the public choice problems that will arise with all these proposals, not just the legislative house. I think such problems are immediate and large (i.e., not just a perturbation that can be handled later), and would strongly drive the best solution. In particular:
Lastly, some tangents:
Although the way this problem tends to be formalized varies with context, I don’t think we have confidence in any of the formalizations. The different versions are very tightly related, so that a solution in one context is likely give, or at least strongly point toward, solutions for the others.
As a time-saving device, I will just quote a few paragraphs from existing papers that review the literature, along with the relevant part of their list of references. I hope to update this from time to time, and perhaps turn it into a proper review article of its own one day. If you have a recommendation for this bibliography (either a single citation, or a paper I should quote), please do let me know.
From “Quantum Mereology: Factorizing Hilbert Space into Subsystems with Quasi-Classical Dynamics”, arXiv:2005.12938:
While this question has not frequently been addressed in the literature on quantum foundations and emergence of classicality, a few works have highlighted its importance and made attempts to understand it better. Brun and Hartle [2] studied the emergence of preferred coarse-grained classical variables in a chain of quantum harmonic oscillators. Efforts to address the closely related question of identifying classical set of histories (also known as the “Set Selection” problem) in the Decoherent Histories formalism [3–7, 10] have also been undertaken. Tegmark [9] has approached the problem from the perspective of information processing ability of subsystems and Piazza [8] focuses on emergence of spatially local sybsystem structure in a field theoretic context. Hamiltonian induced factorization of Hilbert space which exhibit k-local dynamics has also been studied by Cotler et al [14]). The idea that tensor product structures and virtual subsystems can be identified with algebras of observables was originally introduced by Zanardi et al in [15, 16] and was further extended in Kabernik, Pollack and Singh [17] to induce more general structures in Hilbert space. In a series of papers (e.g. [18–21]; see also [22]) Castagnino, Lombardi, and collaborators have developed the self-induced decoherence (SID) program, which conceptualizes decoherence as a dynamical process which identifies the classical variables by inspection of the Hamiltonian, without the need to explicitly identify a set of environment degrees of freedom. Similar physical motivations but different mathematical methods have led Kofler and Brukner [23] to study the emergence of classicality under restriction to coarse-grained measurements.
[1] S. M. Carroll and A. Singh, “Mad-Dog Everettianism: Quantum Mechanics at Its Most Minimal,” arXiv:1801.08132 [quant-ph].
[2] T. A. Brun and J. B. Hartle, “Classical dynamics of the quantum harmonic chain,” Physical Review D 60 no. 12, (1999) 123503.
[3] M. Gell-Mann and J. Hartle, “Alternative decohering histories in quantum mechanics,” arXiv preprint arXiv:1905.05859 (2019) .
[4] F. Dowker and A. Kent, “On the consistent histories approach to quantum mechanics,” Journal of Statistical Physics 82 no. 5-6, (1996) 1575–1646.
[5] A. Kent, “Quantum histories,” Physica Scripta 1998 no. T76, (1998) 78.
[6] C. Jess Riedel, W. H. Zurek, and M. Zwolak, “The rise and fall of redundancy in decoherence and quantum Darwinism,” New Journal of Physics 14 no. 8, (Aug, 2012) 083010, arXiv:1205.3197[quant-ph].
[7] R. B. Griffiths, “Consistent histories and the interpretation of quantum mechanics,” J. Statist. Phys.
36 (1984) 219.
[8] F. Piazza, “Glimmers of a pre-geometric perspective,” Found. Phys. 40 (2010) 239–266,
arXiv:hep-th/0506124 [hep-th].
[9] M. Tegmark, “Consciousness as a state of matter,” Chaos, Solitons & Fractals 76 (2015) 238–270.
[10] J. P. Paz and W. H. Zurek, “Environment-induced decoherence, classicality, and consistency of quantum histories,” Physical Review D 48 no. 6, (1993) 2728.
[11] N. Bao, S. M. Carroll, and A. Singh, “The Hilbert Space of Quantum Gravity Is Locally Finite-Dimensional,” arXiv:1704.00066 [hep-th].
[12] T. Banks, “QuantuMechanics and CosMology.” Talk given at the festschrift for L. Susskind, Stanford University, May 2000, 2000.
[13] W. Fischler, “Taking de Sitter Seriously.” Talk given at Role of Scaling Laws in Physics and Biology (Celebrating the 60th Birthday of Geoffrey West), Santa Fe, Dec., 2000.
[14] J. S. Cotler, G. R. Penington, and D. H. Ranard, “Locality from the spectrum,” Communications in Mathematical Physics 368 no. 3, (2019) 1267–1296.
[15] P. Zanardi, “Virtual quantum subsystems,” Phys. Rev. Lett. 87 (2001) 077901, arXiv:quant-ph/0103030 [quant-ph].
[16] P. Zanardi, D. A. Lidar, and S. Lloyd, “Quantum tensor product structures are observable induced,” Phys. Rev. Lett. 92 (2004) 060402, arXiv:quant-ph/0308043 [quant-ph].
[17] O. Kabernik, J. Pollack, and A. Singh, “Quantum State Reduction: Generalized Bipartitions from Algebras of Observables,” Phys. Rev. A 101 no. 3, (2020) 032303, arXiv:1909.12851 [quant-ph].
[18] M. Castagnino and O. Lombardi, “Self-induced decoherence: a new approach,” Studies in the History and Philosophy of Modern Physics 35 no. 1, (Jan, 2004) 73–107.
[19] M. Castagnino, S. Fortin, O. Lombardi, and R. Laura, “A general theoretical framework for decoherence in open and closed systems,” Class. Quant. Grav. 25 (2008) 154002, arXiv:0907.1337 [quant-ph].
[20] O. Lombardi, S. Fortin, and M. Castagnino, “The problem of identifying the system and the environment in the phenomenon of decoherence,” in EPSA Philosophy of Science: Amsterdam 2009, H. W. de Regt, S. Hartmann, and S. Okasha, eds., pp. 161–174. Springer Netherlands, Dordrecht, 2012.
[21] S. Fortin, O. Lombardi, and M. Castagnino, “Decoherence: A Closed-System Approach,” Brazilian Journal of Physics 44 no. 1, (Feb, 2014) 138–153, arXiv:1402.3525 [quant-ph].
[22] M. Schlosshauer, “Self-induced decoherence approach: Strong limitations on its validity in a simple spin bath model and on its general physical relevance,” Phys. Rev. A 72 no. 1, (Jul, 2005) 012109, arXiv:quant-ph/0501138 [quant-ph].
[23] J. Kofler and C. Brukner, “Classical World Arising out of Quantum Physics under the Restriction of Coarse-Grained Measurements,” Phys. Rev. Lett. 99 no. 18, (Nov, 2007) 180403, arXiv:quant-ph/0609079 [quant-ph].
From “The Objective past of a quantum universe: Redundant records of consistent histories”, arXiv:1312.0331:
“Into what mixture does the wavepacket collapse?” This is the preferred basis problem in quantum mechanics [1]. It launched the study of decoherence [2, 3], a process central to the modern view of the quantum-classical transition [4–9]. The preferred basis problem has been solved exactly for so-called pure decoherence [1, 10]. In this case, a well-defined pointer basis [1] emerges whose origins can be traced back to the interaction Hamiltonian between the quantum system and its environment [1, 2, 4]. An approximate pointer basis exists for many other situations (see, e. g., Refs. [11–17]).
The consistent (or decoherent) histories framework [18–21] was originally introduced by Griffiths. It has evolved into a mathematical formalism for applying quantum mechanics to completely closed systems, up to and including the whole universe. It has been argued that quantum mechanics within this framework would be a fully satisfactory physical theory only if it were supplemented with an unambiguous mechanism for identifying a preferred set of histories corresponding, at the least, to the perceptions of observers [22–29] (but see counterarguments [30–35]). This would address the Everettian [36] question: “What are the branches in the wavefunction of the Universe?” This defines the set selection problem, the global analog to the preferred basis problem.
It is natural to demand that such a set of histories satisfy the mathematical requirement of consistency, i.e., that their probabilities are additive. The set selection problem still looms large, however, as almost all consistent sets bear no resemblance to the classical reality we perceive [37–39]. Classical reasoning can only be done relative to a single consistent set [20, 31, 32]; simultaneous reasoning from different sets leads to contradictions [22–24, 40, 41]. A preferred set would allow one to unambiguously compute probabilities^{1} for all observations from first principles, that is, from (1) a wavefunction of the Universe and (2) a Hamiltonian describing the interactions.
To agree with our expectations, a preferred set would describe macroscopic systems via coarse-grained variables that approximately obey classical equations of motion, thereby constituting a “quasiclassical domain” [14, 23, 24, 40, 49, 50]. Various principles for its identification have been explored, both within the consistent histories formalism [15, 26, 39, 49, 51–56] and outside it [57–61]. None have gathered broad support.
^{1}We take Born’s rule for granted, putting aside the question of whether it should be derived from other principles [9, 36, 42–48] or simply assumed. That issue is independent of (and cleanly separated from) the topic of this paper.
[1] W. H. Zurek, Phys. Rev. D 24, 1516 (1981).
[2] W. H. Zurek, Phys. Rev. D 26, 1862 (1982).
[3] E. Joos and H. D. Zeh, Zeitschrift für Physik B Condensed Matter 59, 223 (1985).
[4] H. D. Zeh, Foundations of Physics 3, 109 (1973).
[5] W. H. Zurek, Physics Today 44, 36 (1991).
[6] W. H. Zurek, Rev. Mod. Phys. 75, 715 (2003).
[7] E. Joos, H. D. Zeh, C. Kiefer, D. Giulini, J. Kupsch, and I.-O. Stamatescu, Decoherence and the Appearance of a Classical World in Quantum Theory, 2nd ed. (SpringerVerlag, Berlin, 2003).
[8] M. Schlosshauer, Decoherence and the Quantum-toClassical Transition (Springer-Verlag, Berlin, 2008); in Handbook of Quantum Information, edited by M. Aspelmeyer, T. Calarco, and J. Eisert (Springer, Berlin/Heidelberg, 2014).
[9] W. H. Zurek, Physics Today 67, 44 (2014).
[10] M. Zwolak, C. J. Riedel, and W. H. Zurek, Physical Review Letters 112, 140406 (2014).
[11] J. R. Anglin and W. H. Zurek, Physical Review D 53, 7327 (1996); D. A. R. Dalvit, J. Dziarmaga, and W. H. Zurek, Physical Review A 72, 062101 (2005).
[12] O. Kübler and H. D. Zeh, Annals of Physics 76, 405 (1973).
[13] W. H. Zurek, S. Habib, and J. P. Paz, Phys. Rev. Lett. 70, 1187 (1993).
[14] M. Gell-Mann and J. B. Hartle, Phys. Rev. D 47, 3345 (1993).
[15] M. Gell-Mann and J. B. Hartle, Phys. Rev. A 76, 022104 (2007).
[16] J. J. Halliwell, Phys. Rev. D 58, 105015 (1998).
[17] J. Paz and W. H. Zurek, Phys. Rev. Lett. 82, 5181 (1999).
[18] R. B. Griffiths, Journal of Statistical Physics 36, 219 (1984).
[19] R. Omnès, The Interpretation of Quantum Mechanics (Princeton University Press, Princeton, NJ, 1994).
[20] R. B. Griffiths, Consistent Quantum Theory (Cambridge University Press, Cambridge, UK, 2002).
[21] J. J. Halliwell, in Fundamental Problems in Quantum Theory, Vol. 775, edited by D.Greenberger and A.Zeilinger (Blackwell Publishing Ltd, 1995) arXiv:grqc/9407040.
[22] F. Dowker and A. Kent, Phys. Rev. Lett. 75, 3038 (1995).
[23] F. Dowker and A. Kent, Journal of Statistical Physics. 82, 1575 (1996).
[24] A. Kent, Phys. Rev. A 54, 4670 (1996).
[25] A. Kent, Phys. Rev. Lett. 78, 2874 (1997).
[26] A. Kent and J. McElwaine, Phys. Rev. A 55, 1703 (1997).
[27] A. Kent, in Bohmian Mechanics and Quantum Theory: An Appraisal, edited by A. F. J. Cushing and S. Goldstein (Kluwer Academic Press, Dordrecht, 1996) arXiv:quant-ph/9511032.
[28] E. Okon and D. Sudarsky, Stud. Hist. Philos. Sci. B 48, Part A, 7 (2014).
[29] E. Okon and D. Sudarsky, arXiv:1504.03231 (2015).
[30] R. B. Griffiths and J. B. Hartle, Physical Review Letters 81, 1981 (1998).
[31] R. B. Griffiths, Physical Review A 57, 1604 (1998).
[32] R. B. Griffiths, Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 44, 93 (2013).
(↵ returns to text)
On microscopic scales, sound is air pressure fluctuating in time . Taking the Fourier transform of gives the frequency distribution , but in an eternal way, applying to the entire time interval for .
Yet on macroscopic scales, sound is described as having a frequency distribution as a function of time, i.e., a note has both a pitch and a duration. There are many formalisms for describing this (e.g., wavelets), but a well-known limitation is that the frequency of a note is only well-defined up to an uncertainty that is inversely proportional to its duration .
At the mathematical level, a given wavefunction is almost exactly analogous: macroscopically a particle seems to have a well-defined position and momentum, but microscopically there is only the wavefunction . The mapping of the analogy^{a } is . Wavefunctions can of course be complex, but we can restrict ourself to a real-valued wavefunction without any trouble; we are not worrying about the dynamics of wavefunctions, so you can pretend the Hamiltonian vanishes if you like.
In order to get the acoustic analog of Planck’s constant , it helps to imagine going back to a time when the pitch of a note was measured with a unit that did not have a known connection to absolute frequency, i.e., to inverse-time units. To my (very limited) understanding, by the 6th century BC it was already understood that an octave was the difference between two notes when one is vibrating twice as fast as another, but the absolute frequency (oscillations per second) of any particular note — say C♯ — was not known.^{b } Let’s arbitrarily declare to be the interval between pitches C♯ and C in the one-lined octave. Today we know that C♯ and C correspond to 277.18 and 261.63 Hz, respectively, so that corresponds to 64.31 ms , but in the distant past for humanity (or the very recent past for me), this was unknown.
If I listened very carefully, or at least if I built special equipment, I would find that the purity of a note’s pitch begins to degrade as the duration of the note approaches its inverse frequency; this would be a hint about the location of the acoustic microscopic scale . That is, I would find it harder and harder to distinguish between the notes C♯ and C as the duration of the notes approached of order 60 ms (although it might happen with even longer durations due to imperfections of my ears/equipment). To confidently establish the relationship between perceived pitch and inverse time, I would probably want to listen to sound made by objects whose frequency of vibration I could measure directly. That would be easy today, but very difficult three thousand years ago.
The analog of , then, would be the ratio . Whenever someone says “middle C is 261.63 Hz”, they are effectively setting (i.e., measuring pitch in units of inverse-time), just as physicists commonly set (measuring momentum in units of inverse-distance). But crucially, for understanding the history of science, this is not possible until you have equipment that is sensitive to the microscopic scale. Before this, you needed two separate systems of units that could not (then) be connected in a principled manner.
The quantum-acoustic analogy is not just conceptual, it is a mathematically precise correspondence, to the point that there are book-length treatments that apply almost equally well to both.^{c } In particular, the Wigner function (previous posts: 1,2) for simultaneously representing the position and momentum of a particle can be used fruitfully in acoustics for simultaneously representing the duration and pitch of sounds (which is closely related to the short-time Fourier transform). And, importantly, it is true for both quantum mechanics and acoustics that the macroscopic limit () is “singular”: Just as only a few -indexed families of quantum states have a sensible classical limit, only a few -indexed families of acoustic waveforms have a sensible decomposition into notes (“music”). When this limit fails, it’s not a case of bad music, it’s a case of your speakers blowing out.
What makes quantum mechanics “mysterious” then is clearly not things like the form of the uncertainty principle per se. (There is an acoustic uncertainty principle of identical mathematical form.) Rather, it is the interpretation of the wavefunction in terms of probability amplitudes and our inability to probe it except indirectly through disturbing measurements, as opposed to acoustic waves which can be measured to arbitrary precision.^{d } Relatedly, there is, to my understanding, no acoustic analog of a mixed quantum state.^{e }
Basic harmonic analysis of acoustics is an interesting topic in elementary physics in its own right. Maybe teaching more of it (especially in a phase-space formulation using a Wigner function) before presenting quantum mechanics would help students more easily see what’s truly unusual about quantum mechanics and what’s just an unfamiliar mathematical framework.
(↵ returns to text)
Physicists often define a Lindbladian superoperator as one whose action on an operator can be written as
(1)
for some operator with positive anti-Hermitian part, , and some set of operators . But how does one efficiently check if a given superoperator is Lindbladian? In this post I give an “elementary” proof of a less well-known characterization of Lindbladians:
for all . Here “” denotes a partial transpose, is the “superprojector” that removes an operator’s trace, is the identity superoperator, and is the dimension of the space upon which the operators act.
Thus, we can efficiently check if an arbitrary superoperator is Lindbladian by diagonalizing and seeing if all the eigenvalues are positive.
The terms superoperator, completely positive (CP), trace preserving (TP), and Lindbladian are defined below in Appendix A in case you aren’t already familiar with them.
Confusingly, the standard practice is to say a superoperator is “positive” when it is positivity preserving: . This condition is logically independent from the property of a superoperator being “positive” in the traditional sense of being a positive operator, i.e., for all operators (matrices) , where
is the Hilbert-Schmidt inner product on the space of matrices. We will refer frequently to this latter condition, so for clarity we call it op-positivity, and denote it with the traditional notation .
It is reasonably well known by physicists that Lindbladian superoperators, Eq. (1), generate CP time evolution of density matrices, i.e., is completely positive when and satisfies Eq. (1). This evolution is furthermore trace-preserving when is Hermitian.^{a }
Indeed, for any one parameter family of CP maps obeying the semigroup property that is differentiable about , the family is necessarily generated by some Lindbladian superoperator: . The Hamiltonian and Lindblad operators defining the Lindbladian superoperator in Eq. (1) can be extracted from the eigendecomposition of for small . Although this procedure is highly enlightening, it does not yield an easily “checkable” criterion for when a given superoperators can be put in the Lindbladian form. How could we easily see whether satisfies Eq. (1) without searching exhaustively through all choices of and ?
It has long been known, but is not always widely appreciated by physicists^{b }, that CP maps are exactly those superoperators^{c } that are op-positive under the partial transpose operation^{d }: , where is called the Choi matrix.^{e } (We use the index convention .) This is just one of several elegant relationships between the most important properties of a superoperator, considered as a map on density matrices, and its corresponding Choi matrix :
Map property | Choi property | |
---|---|---|
Map preserves Hermiticity: ) |
⇔ | Choi is Hermitian: |
Map is CP: |
⇔ | Choi is op-positive: |
Map is trace-preserving: |
⇔ | Unit “outer” trace of Choi: |
Map is unital: |
⇔ | Unit “inner” trace of Choi: |
Here, we define the “outer” and “inner” partial traces^{f } as, respectively,
The equivalences in the above table can all be checked explicitly with index manipulation.
We can use the first two equivalences in the table to show the following.
We define to be the superprojector that removes an operator’s trace, , so that .
Totally true fact: These two statements are equivalent:
Proof: First, we’ll show that (1) implies (2).
If is CP, then . For this to hold for arbitrarily small , it must also be true when dropping the terms for all sufficiently small , i.e., for all positive below some threshold. Then we use our first lemma, which is proved in Appendix B.
Lemma 1: The superoperator is an op-positive superoperator for all sufficiently small if and only if is an op-positive superoperator.
Applying Lemma 1, we conclude that (1) implies (2).
Now we’ll show that (2) implies (1). If is op-positive, then by Lemma 1 we know for sufficiently small . Now we make use of our second lemma, also proved in Appendix B.
Lemma 2: If the partial trace of a superoperator is positive, then the partial traces of all positive powers of that superoperator are also positive, i.e., for all positive integers .
If for sufficiently small , then by Lemma 2 we know, for the same values of , that the object
is an op-positive superoperator (for any positive integer ). If we define
then one can then check that and . Since is a mixture (convex combination) of op-positive superoperators , it itself is an op-positive superoperator for all , and hence its limit
is also an op-positive superoperator. This makes completely positive for sufficiently small , but since complete positivity is preserved under composition we conclude that is CP for all . ☐
(2)
to be an equivalent definition to say that is a Lindbladian superoperator. It can be supplemented with the trace-preserving condition (implying for all ) to define the subset of Lindbladians generating CPTP evolution. Lindblad called this subset “completely dissipative”, and it is equivalent to Eq. (2) with Hermitian .
Although Eq. (2) is not as useful as Eq. (1) for understanding the action of a Lindbladian, it is much easier to use Eq. (2) to check whether a given superoperator is Lindbladian.
With a bit of manipulation, we can re-write Eq. (2) in a more quantum-information-y (and less linear-algebraic) way:
where is some maximally entangled state and projects onto the orthogonal subspace.^{g } (This condition is independent of the choice of basis and hence the choice of maximally entangled state.)
If you’ve interested in learning more, Tarasov’s “Quantum Mechanics of Non-Hamiltonian and Dissipative Systems” is the most thorough yet readable monograph I’ve found.
For our purposes, superoperators are just linear operators on the vector space of linear operators.^{h } If we represent finite-dimensional linear operators as matrices, then superoperators are matrices. A superoperator can be indexed as and its action on an operator is given by the matrix elements
Here, are the matrix elements of and the parentheses on just emphasize that we treat this as a joint index (taking values) of . (It does not denote antisymmeterization.)
Lindbladians are the subset of superoperators that can be put in the form
for some Hermitian operator (the Hamiltonian) and some set of operator (the Lindblad operators), where of course . You can express this more elegantly as
where “” is just a tensor product with a bit of syntactic sugar: Given any two operators and , the superoperator is defined to have action .
As explained near the beginning, we distinguish two notions of superoperator “positivity”:
When the first condition holds, people usually just say that is a “positive map”, but this can be confusing because it is logically independent of being a “positive operator” when thought of, naturally enough, as an operator acting on the space of matrices (the second condition).
A superoperator is said to be completely positive (CP) when is positive for all positive integers , where is the indentity superoperator on a separate space of matrices. The tensor product on superoperators is naturally defined as , extended by linearity. (Note that we do not necessarily require CP maps to preserve operator trace.) Complete positivity is a strengthening of positivity preservation (not op-positivity).
Lemma 1: The superoperator is an op-positive superoperator for all sufficiently small if and only if is an op-positive superoperator.
Proof. For a fixed vector and Hermitian operator , consider the family of Hermitian operators for all . If another vector has non-zero overlap with , then
is positive for sufficiently small . Therefore, if there is a such that for arbitrarily small , we know that vector is orthogonal to . Such a vector exists if and only if is not a positive operator, where .☐
Lemma 2: If the partial trace of a superoperator is positive, then the partial traces of all positive powers of that superoperator are also positive, i.e., for all positive integers .
Proof. If is positive then it has the eigendecomposition or, with indices, . The eigenvalues are positive and the eigenoperators are orthonormal under the Hilbert-Schmidt norm. The previous expression also gives us the matrix elements for the original superoperator, allowing us to use simple index manipulation to show that
This expression is manifestly positive since it’s a mixture (convex combination) of superprojectors, so we conclude for positive integers .☐
Remark. This is equivalent to the statement that complete positivity is preserved by composition.
(↵ returns to text)
In this post I review the 2010 book “Lifecycle Investing” by Ian Ayres and Barry Nalebuff. (Amazon link here; no commission received.) They argue that a large subset of investors should adopt a (currently) unconventional strategy: One’s future retirement contributions should effectively be treated as bonds in one’s retirement portfolio that cannot be efficiently sold; therefore, early in life one should balance these low-volatility assets by gaining exposure to volatile high-return equities that will generically exceed 100% of one’s liquid retirement assets, necessitating some form of borrowing.
“Lifecycle Investing” was recommended to me by a friend who said the book “is extremely worth reading…like learning about index funds for the first time…Like worth paying >1% of your lifetime income to read if that was needed to get access to the ideas…potentially a lot more”. Ayres and Nalebuff lived up to this recommendation. Eventually, I expect the basic ideas, which are simple, to become so widespread and obvious that it will be hard to remember that it required an insight.
In part, what makes the main argument so compelling is that (as shown in the next section), it is closely related to an elegant explanation for something we all knew to be true — you should increase the bond-stock ratio of your portfolio as you get older — yet previously had bad justifications for. It also gives new actionable, non-obvious, and potentially very important advice (buy equities on margin when young) that is appropriately tempered by real-world frictions. And, most importantly, it means I personally feel less bad about already being nearly 100% in stocks when I picked up the book.
My main concerns, which are shared by other reviewers and which are only partially addressed by the authors, are:
By far the best review of this book I’ve found after a bit of Googling is the one by Fredrick Vars, a law professor at the University of Alabama: [PDF]. Read that. I wrote most of my review before Vars, and he anticipated almost all of my concerns while offering illuminating details on some of the legal aspects.
One way to frame the insight, slightly different than as presented in the book, is as arising out of a solution to a basic puzzle.
The vast majority of financial advisors agree that retirement investments should have a higher percentage of volatile assets (stocks, essentially) when the person is young and less when they are old. This is often justified by the argument that volatile returns can be averaged out over the years, but, taken naively, this is flat out wrong. As Alex Tabarrok puts it^{a }
Many people think that uncertainty washes out when you buy and hold for a long period of time. Not so, that is the fallacy of time diversification. Although the average return becomes more certain with more periods you don’t get the average return you get the total payoff and that becomes more uncertain with more periods.
More quantitatively: When a principal is invested over years in a fund with a given annual expected return and volatility (standard deviation) , the average of the yearly returns becomes more certain for more years and approaches for the usual central-limit reasons. However, your payout is not the average return! Rather, your payout is the compounded amount^{b }
and the uncertainty of that does not go down with more time…even in percentage terms. That is, the ratio of the standard deviation in payout to the mean payout, , goes up the larger the number of years that the principal is invested.
Sometimes when confronted with this mathematical reality people backtrack to a justification like this: If you are young and you take a large downturn, you can adapt to this by absorbing the loss over many years of slightly smaller future consumption (adaptation), but if you are older you must drastically cut back, so the hit to your utility is larger. This is a true but fairly minor consideration. Even if we knew we would be unable to adapt our consumption (say because it was dominated by fixed costs), it would still be much better to be long on stocks when young and less when old.
Another response is to point out that, although absolute uncertainty in stock performance goes up over time, the odds of beating bonds also keeps going up. That is, on any given day the odds that stocks outperform bonds is maybe only a bit better than a coin flip, but as the time horizon grows, the odds get progressively better.^{c } This is true, but some thought shows it’s not a good argument. In short, even if the chance of doing worse than bonds keeps falling, the distribution of scenarios where you lose to bonds could get more and more extreme; when you do worse, maybe you do much worse. (For an extensive explanation, see the section “Probability of Shortfall” in the John Norstad’s “Risk and Time“, which Tabarrok above linked to as “fallacy of time diversification”.) This, it turns out, is not true — we see below that stocks do in fact get safer over time — but the possibility of extreme distributions shows why the probability-of-beating-bonds-goes-up-over-time argument is unsound.
To neatly resolve this puzzle, the authors make a strong simplifying assumption. (Importantly, the main idea is robust to relaxing this assumption somewhat,^{d } but for now let’s accept it in its idealized form.)
The main assumption is that the portion of your future income that you will be saving for retirement (e.g., your stream of future 401(k) contributions) can be predicted with relative confidence and are financially equivalent to today holding a sequence of bonds that pay off on a regular schedule in the future (but cannot be sold). When we consider how our retirement portfolio today should be split between bonds and stocks, we should include the net present value of our future contributions. That is the main idea.
Under some not-unreasonable simplifying assumptions, Samuelson and Merton showed long ago^{e } that if, counterfactually, you had to live off an initial lump sum of wealth, then the optimal way to invest that sum would be to maintain a constant split between assets of different volatility (e.g., 40% stocks and 60% bonds), with the appropriate split determined by your personal risk tolerance. However, even though you won’t magically receive your future retirement contributions as a lump sum in real life, it follows that if those contributions were perfectly predictable, and if you could borrow money at the risk-free rate, then you should borrow against your future contributions, converting them to their net present value, and keep the same constant fraction of the money in the stock market. Starting today.
Crucially, when you are young your liquid retirement portfolio (the sum of your meager contribution up to that point, plus a bit of accumulated interest) is dwarfed by your expected future contributions. Even if you invest 100% of your retirement account into stocks you are insufficiently exposed to the stock market. In order to get sufficient stock exposure, you should borrow lots of money at the risk-free rate and put it in the stock market. It is only as you get older, when the ratio between your retirement account and the present value of future earnings increases, that you should move more and more of your (visible) retirement account into regular bonds.
The resolution of the puzzle is that the optimal portfolio (in the idealized case) only looks like it’s stock-heavy early in life because you’re forgetting about your stream of future retirement contributions (a portion of your future salary), which, the authors claim, is essentially like a bond that can’t be traded.
(If the above concept isn’t immediately compelling to you, my introduction has failed. Close this blog and just go read the first couple chapters of their book.)
Most of the book is devoted to fleshing out and defending the implications of this idea for the real world where there are a variety of complications, most notably that you cannot borrow unlimited amounts at the risk-free rate. Nevertheless, the authors conclude that when many people are young they should buy equities on margin (i.e., with borrowed money) up to 2:1 leverage, at least if they have access to low enough interests rates to make it worthwhile.
The organization of chapters are as follows:
In general the authors compare their lifecycle investing strategy to two conventional strategies: the “birthday rule” (aka an “age-in-bonds rule“), where the investor allocates a percentage of their portfolio to stocks given by 100 (or 110) minus their age, and the “constant percentage rule”, where the investor keeps a constant fraction of their portfolio in stocks.
In Chapter 3, the authors argue that the lifecycle strategy consistently beats conventional strategies when (a) holding fixed expected return and minimizing variance, (b) holding fixed variance and maximizing expected return, (c) holding fixed very bad (first percentile) returns while maximizing expected return. If you look at a hypothetical ensemble of investors on historical data, one retiring during each year between 1914 and 2010 (when the book was published), every single investor would have been had more at retirement by adopting the lifecycle strategy, and generally by an enormous 50% or more. Here’s the total return of the investors vs. retirement year depending on whether they following a lifecycle strategy, birthday rule, or the constant percentage rule:
And here are the quantiles:
Although they rely on historical simulations for this, it’s really grounded in a very simple theoretical idea: your liquid retirement portfolio is extremely small when you’re young, so for any plausible level of risk aversion, you are better off leveraging equities initially.
Chapter 4 considers more testing variations: international stocks returns, Monte Carlo simulations with historically anomalous stock performance, higher interest rates, etc. They also show the strategy can easily be modified to incorporate (possibly EMH-violating) beliefs about one’s ability to time the market. (The authors use Robert Shiller’s theory of cyclically adjusted price-to-earnings ratio, which they neither endorse nor reject.)
In Chapter 7, the authors draw on the work of Samuelson and Merton to address the key question: what is the constant fraction in stocks that you should be targeting anyways? Assuming assumptions, the optimal “Samuelson share” to have invested in stocks is
The variables above are defined as follows.
The authors give reasons to be wary of taking this formula too seriously, especially because it’s not so easy to know what you should choose (discussed more below). However, it is very notable that as equity volatility increases — say, because the world is gripped by a global pandemic — the appropriate amount of the portfolio to have exposed to the stock market drops drastically. The authors suggest using the VIX to estimate the equity volatility, and appropriately rebalancing your portfolio when that metric changes. Continuously hitting the correct Samuelson share without shooting yourself in the foot looks hard, in practice, which the authors admit. Still, there is so much to gain from leverage that it’s very likely you can collect a good chunk of the upside even with a conservative and careful approach.
The first general point of caution tempers (but definitely does not eliminate) the suggestion to invest in equities on margin: one’s risk tolerance is not an easy thing to elicit. To a large extent we do this by imagining various outcomes, deciding which outcomes we would prefer, and then inferring (with regularity assumptions) what our risk tolerance must be. Therefore, it would likely be a mistake to immediately take whatever risk tolerance you previously thought you had as deployed in conventional investment strategies and then follow the advice in this book. After introspection, I’ve sorta decided that although I am still less risk averse than the general population, I’m more risk averse than I thought because I was following the intuition (which I can now justify better) that I should be heavy in stocks at my age. The authors address the general difficulty of someone identifying their own risk tolerance (e.g., how dependent it is on framing effects), but they do not discuss how your beliefs about your risk tolerance might be entangled with what investment strategy you have previously been using.
However, this bears repeating: For every level of risk tolerance, there exists a form of this strategy that beats (both in expectation and risk) the best conventional strategy. The fact that, when young, you are buying stocks on margin makes it tempting to interpret this strategy is only good when one is not very risk averse or when the stock market has a good century. But for any time-homogeneous view you have on what stocks will do in the future, there is a version of this strategy that is better than a conventional strategy. (A large fraction of casual critics seem to miss this point.) The authors muddy this central feature a bit because, on my reading, they are a bit less risk averse than the average person. The book would have been more pointed if they had erred toward risk aversion in their various examples of the lifecycle strategy.
The second point of caution is gestured at in the criticism by Nobel winner Paul Samuelson^{f }. (He was also a mentor of the authors.) The costs of going truly bust would be catastrophic:
The ideas that I have been criticizing do not shrivel up and die. They always come back… Recently I received an abstract for a paper in which a Yale economist and a Yale law school professor advise the world that when you are young and you have many years ahead of you, you should borrow heavily, invest in stocks on margin, and make a lot of money. I want to remind them, with a well-chosen counterexample: I always quote from Warren Buffett (that wise, wise man from Nebraska) that in order to succeed, you must first survive. People who leverage heavily when they are very young do not realize that the sky is the limit of what they could lose and from that point on, they would be knocked out of the game.
The authors respond to these sorts of concerns by emphasizing that (1) the risk of losing everything is highest when you are very young, which is exactly when the amount you have in your retirement account is very small, and (2) they are recommending adding leverage to your retirement account, not all your assets. If you expect the total of your retirement contributions to be roughly $1 million by the time you retire, losing $20,000 and zeroing out your retirement account when you are 25 is not catastrophic (and is still a rare outcome under their strategy). You should still have a rainy day fund, and you’ll just earn more money in the future.
However, I don’t think this response seriously grapples with the best concrete form of the wary intuition many people have to their strategy. I think the main problem is that most people are implicitly using their retirement account not just as a place to save for retirement assuming a normal healthy life, but also as a rainy day fund for a variety of bad events. In the US, 12% of people are disabled; I don’t know how much you can push down those odds knowing you are healthy at a given time, but it seems like you need to allow for a ~3% chance you are partially or totally disabled at some point. Although people buy disability insurance, they also know that if they ever needed to tap into their retirement account they could, possibly with a modest tax penalty. (Likewise for other unforeseen crises.)
Another way to say this: your future earning are substantially more likely to fail than the US government, so they cannot be idealized as a bond. By purchasing the right insurance, keeping enough in savings account, etc., I’m sure there’s a way to hedge against this, and I’m confident the core ideas in this book survive this necessary hedging. But I would have liked the authors to discuss how to do that in at least as much concrete detail as they describe the mechanics of how to invest on margin.^{g } If people have been relying on the conventional strategy and have consequently been implicitly enjoying a form of buffer/insurance, it is paramount to highlight this and find a substitute before moving on to an unconventional strategy that lacks that buffer.
Now, if we only had to insure against tail risks, that would be fine, but there is an extreme version of this issue that has the potential to undermine the entire idea: why is my future income stream like a bond rather than a stock? I have a ton of uncertainty about how my income will increase in the future. Indeed, personally, I think I trust the steady growth of the stock market more! The authors do advise against adopting their strategy if your future income stream is highly correlated with the market (e.g., you’re a banker), but they don’t get very quantitative, and they don’t say much about what do if that stream is highly volatile but not very correlated with stocks. (Sure, if it’s uncorrelated then you’re want to match your “investment” in your future income stream with some actual investment in stocks for diversification, but how much should this overall high volatility change the strategy?^{h })
It will take some time before I have mulled this around enough to even start assessing whether I should be investing with significant leverage. It seems pretty plausible to me that my future income is much more uncertain than a bond, although that’s something I’ll need to meditate on.
I, like the authors, really wish there was a mutual fund that automatically implemented this strategy, like target-date funds do for (strategies similar to) the birthday rule. At the very least it would induce pointed discussion about the benefits and risks of the strategy. Unfortunately, a decade after this book was released there is no such option and, as the authors admit in the book, concretely implementing the strategy yourself in the real world can be a headache.
However, because of this book I can at least feel less guilty for being overwhelmingly in equities. After finishing this book I finally exchanged much of my remaining Vanguard 2050 target-date funds, which contain bonds, for pure equity index funds. I had been keeping them around in part because going 100% equities felt vaguely dangerous. Now that there is a good argument that the optimal allocation is greater than 100% equities — though that is by no means assured — this no longer feels so extreme. Crossing the 100% barrier by acquiring leverage involves many real-world complications, but in the platonic realm there is nothing special about the divide.
[This has been cross-posted to LessWrong (comments).](↵ returns to text)