[Summary: Constantly evolving tests for what counts as worryingly powerful AI is mostly a consequence of how hard it is to design tests that will identify the real-world power of future automated systems. I argue that Alan Turing in 1950 could not reliably distinguish a typical human from an appropriately-fine-tuned GPT-4, yet all our current automated systems cannot produce growth above historic trends.a
What does the phenomena of “moving the goalposts” for what counts as AI tell us about AI?
It’s often said that people repeatedly revising their definition of AI, often in response to previous AI tests being passed, is evidence that people are denying/afraid of reality, and want to put their head in the sand or whatever. There’s some truth to that, but that’s a comment about humans and I think it’s overstated.
Closer to what I want to talk about is the idea AI is continuously redefined to mean “whatever humans can do that hasn’t been automated yet”, often taken to be evidence that AI is not a “natural” kind out there in the world, but rather just a category relative to current tech. There’s also truth to this, but not exactly what I’m interested in.
To me, it is startling that (I claim) we have systems today that would likely pass the Turing test if administered by Alan Turing, but that have negligible impact on a global scale. More specifically, consider fine-tuning GPT-4 to mimic a typical human who lacks encyclopedic knowledge of the contents of the internet. Suppose that it’s mimicking a human with average intelligence whose occupation has no overlap with Alan Turing’s expertise.… [continue reading]
Here’s a collection of reviews of the arguments that artificial general intelligence represents an existential risk to humanity. They vary greatly in length and style. I may update this from time to time.
Here, from my perspective, are some different true things that could be said, to contradict various false things that various different people seem to believe, about why AGI would be survivable on anything remotely resembling the current pathway, or any other pathway we can easily jump to.
This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire -- especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans. Second, I formulate and evaluate a more specific six-premise argument that creating agents of this kind will lead to existential catastrophe by 2070. On this argument, by 2070: (1) it will become possible and financially feasible to build relevantly powerful and agentic AI systems; (2) there will be strong incentives to do so; (3) it will be much harder to build aligned (and relevantly powerful/agentic) AI systems than to build misaligned (and relevantly powerful/agentic) AI systems that are still superficially attractive to deploy; (4) some such misaligned systems will seek power over humans in high-impact ways; (5) this problem will scale to the full disempowerment of humanity; and (6) such disempowerment will constitute an existential catastrophe.
… [continue reading]
Tyler John & William MacAskill have recently released a preprint of their paper “Longtermist Institutional Reform” [PDF]. The paper is set to appear in an EA-motivated collection “The Long View” (working title), from Natalie Cargill and Effective Giving.
Here is the abstract:
There is a vast number of people who will live in the centuries and millennia to come. In all probability, future generations will outnumber us by thousands or millions to one; of all the people who we might affect with our actions, the overwhelming majority are yet to come. In the aggregate, their interests matter enormously. So anything we can do to steer the future of civilization onto a better trajectory, making the world a better place for those generations who are still to come, is of tremendous moral importance. Political science tells us that the practices of most governments are at stark odds with longtermism. In addition to the ordinary causes of human short-termism, which are substantial, politics brings unique challenges of coordination, polarization, short-term institutional incentives, and more. Despite the relatively grim picture of political time horizons offered by political science, the problems of political short-termism are neither necessary nor inevitable. In principle, the State could serve as a powerful tool for positively shaping the long-term future. In this chapter, we make some suggestions about how we should best undertake this project. We begin by explaining the root causes of political short-termism. Then, we propose and defend four institutional reforms that we think would be promising ways to increase the time horizons of governments: 1) government research institutions and archivists; 2) posterity impact assessments; 3) futures assemblies; and 4) legislative houses for future generations.
… [continue reading]
In this post I review the 2010 book “Lifecycle Investing” by Ian Ayres and Barry Nalebuff. (Amazon link here; no commission received.) They argue that a large subset of investors should adopt a (currently) unconventional strategy: One’s future retirement contributions should effectively be treated as bonds in one’s retirement portfolio that cannot be efficiently sold; therefore, early in life one should balance these low-volatility assets by gaining exposure to volatile high-return equities that will generically exceed 100% of one’s liquid retirement assets, necessitating some form of borrowing.
“Lifecycle Investing” was recommended to me by a friend who said the book “is extremely worth reading…like learning about index funds for the first time…Like worth paying 1% of your lifetime income to read if that was needed to get access to the ideas…potentially a lot more”. Ayres and Nalebuff lived up to this recommendation. Eventually, I expect the basic ideas, which are simple, to become so widespread and obvious that it will be hard to remember that it required an insight.
In part, what makes the main argument so compelling is that (as shown in the next section), it is closely related to an elegant explanation for something we all knew to be true — you should increase the bond-stock ratio of your portfolio as you get older — yet previously had bad justifications for. It also gives new actionable, non-obvious, and potentially very important advice (buy equities on margin when young) that is appropriately tempered by real-world frictions. And, most importantly, it means I personally feel less bad about already being nearly 100% in stocks when I picked up the book.
My main concerns, which are shared by other reviewers and which are only partially addressed by the authors, are:
- Future income streams might be more like stocks than bonds for the large majority of people.
… [continue reading]
[Tina White is a friend of mine and co-founder of COVID Watch, a promising app for improving contact tracing for the coronavirus while preserving privacy. I commissioned Tom Higgins to write this post in order to bring attention to this important project and put it in context of related efforts. -Jess Riedel]
Countries around the world have been developing mobile phone apps to alert people to potential exposure to COVID-19. There are two main mechanism used:
- Monitoring a user’s location, comparing it to an external (typically, government) source of information about infections, and notifying the user if they are entering, or previously entered, a high-risk area.
- Detecting when two users come in close proximity to each other and then, if one user later reports to have been infected, notifying the second user and/or the government.
The first mechanism generally uses the phone’s location data, which is largely inferred from GPS.a The second method can also be accomplished with GPS, by simply measuring the distance between users, but it can instead be accomplished with phone-to-phone bluetooth connections … [continue reading]
People often say to me “Jess, all this work you do on the foundations of quantum mechanics is fine as far as it goes, but it’s so conventional and safe. When are you finally going to do something unusual and take some career risks?” I’m now pleased to say I have a topic to bring up in such situations: the thermodynamic incentives of powerful civilizations in the far future who seek to perform massive computations. Anders Sandberg, Stuart Armstrong, and Milan M. Ćirković previously argued for a surprising connection between Landauer’s principle and the Fermi paradox, which Charles Bennett, Robin Hanson, and I have now critiqued. Our comment appeared today in the new issue of Foundations of Physics:
In their article [arXiv:1705.03394
], 'That is not dead which can eternal lie: the aestivation hypothesis for resolving Fermi's paradox', Sandberg et al. try to explain the Fermi paradox (we see no aliens) by claiming that Landauer's principle implies that a civilization can in principle perform far more (~1030
times more) irreversible logical operations (e.g., error-correcting bit erasures) if it conserves its resources until the distant future when the cosmic background temperature is very low. So perhaps aliens are out there, but quietly waiting. Sandberg et al. implicitly assume, however, that computer-generated entropy can only be disposed of by transferring it to the cosmological background. In fact, while this assumption may apply in the distant future, our universe today contains vast reservoirs and other physical systems in non-maximal entropy states, and computer-generated entropy can be transferred to them at the adiabatic conversion rate of one bit of negentropy to erase one bit of error.
… [continue reading]
Here is the first result out of the project Verifying Deep Mathematical Properties of AI Systemsa funded through the Future of Life Institute.
Noisy data, non-convex objectives, model misspecification, and numerical instability can all cause undesired behaviors in machine learning systems. As a result, detecting actual implementation errors can be extremely difficult. We demonstrate a methodology in which developers use an interactive proof assistant to both implement their system and to state a formal theorem defining what it means for their system to be correct. The process of proving this theorem interactively in the proof assistant exposes all implementation errors since any error in the program would cause the proof to fail. As a case study, we implement a new system, Certigrad, for optimizing over stochastic computation graphs, and we generate a formal (i.e. machine-checkable) proof that the gradients sampled by the system are unbiased estimates of the true mathematical gradients. We train a variational autoencoder using Certigrad and find the performance comparable to training the same model in TensorFlow.
You can find discussion on HackerNews. The lead author was kind enough to answers some questions about this work.
Q: Is the correctness specification usually a fairly singular statement? Or will it often be of the form “The program satisfied properties A, B, C, D, and E”? (And then maybe you add “F” later.)
Daniel Selsam: There are a few related issues: how singular is a specification, how much of the functionality of the system is certified (coverage), and how close the specification comes to proving that the system actually does what you want (validation).… [continue reading]
President Obama was directly asked in a Wired interview about the dangers Bostrom raises regarding AI. From the transcript:
DADICH: I want to center our conversation on artificial intelligence, which has gone from science fiction to a reality that’s changing our lives. When was the moment you knew that the age of real AI was upon us?
OBAMA: My general observation is that it has been seeping into our lives in all sorts of ways, and we just don’t notice; and part of the reason is because the way we think about AI is colored by popular culture. There’s a distinction, which is probably familiar to a lot of your readers, between generalized AI and specialized AI. In science fiction, what you hear about is generalized AI, right? Computers start getting smarter than we are and eventually conclude that we’re not all that useful, and then either they’re drugging us to keep us fat and happy or we’re in the Matrix. My impression, based on talking to my top science advisers, is that we’re still a reasonably long way away from that. It’s worth thinking about because it stretches our imaginations and gets us thinking about the issues of choice and free will that actually do have some significant applications for specialized AI, which is about using algorithms and computers to figure out increasingly complex tasks. We’ve been seeing specialized AI in every aspect of our lives, from medicine and transportation to how electricity is distributed, and it promises to create a vastly more productive and efficient economy. If properly harnessed, it can generate enormous prosperity and opportunity. But it also has some downsides that we’re gonna have to figure out in terms of not eliminating jobs.
… [continue reading]