Notable reviews of arguments for AGI ruin

Here’s a collection of reviews of the arguments that artificial general intelligence represents an existential risk to humanity. They vary greatly in length and style. I may update this from time to time.

  • Here, from my perspective, are some different true things that could be said, to contradict various false things that various different people seem to believe, about why AGI would be survivable on anything remotely remotely resembling the current pathway, or any other pathway we can easily jump to.
  • This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire -- especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans. Second, I formulate and evaluate a more specific six-premise argument that creating agents of this kind will lead to existential catastrophe by 2070. On this argument, by 2070: (1) it will become possible and financially feasible to build relevantly powerful and agentic AI systems; (2) there will be strong incentives to do so; (3) it will be much harder to build aligned (and relevantly powerful/agentic) AI systems than to build misaligned (and relevantly powerful/agentic) AI systems that are still superficially attractive to deploy; (4) some such misaligned systems will seek power over humans in high-impact ways; (5) this problem will scale to the full disempowerment of humanity; and (6) such disempowerment will constitute an existential catastrophe. I assign rough subjective credences to the premises in this argument, and I end up with an overall estimate of ~5% that an existential catastrophe of this kind will occur by 2070. (May 2022 update: since making this report public in April 2021, my estimate here has gone up, and is now at >10%.)
  • Superintelligence
    Nick Bostrom
    Superintelligence asks the questions: What happens when machines surpass humans in general intelligence? Will artificial agents save or destroy us? Nick Bostrom lays the foundation for understanding the future of humanity and intelligent life.The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. If machine brains surpassed human brains in general intelligence, then this new superintelligence could become extremely powerful - possibly beyond our control. As the fate of the gorillas now depends more on humans than on the species itself, so would the fate of humankind depend on the actions of the machine superintelligence. But we have one advantage: we get to make the first move. Will it be possible to construct a seed Artificial Intelligence, to engineer initial conditions so as to make an intelligence explosion survivable? How could one achieve a controlled detonation? This profoundly ambitious and original book breaks down a vast track of difficult intellectual terrain. After an utterly engrossing journey that takes us to the frontiers of thinking about the human condition and the future of intelligent life, we find in Nick Bostrom’s work nothing less than a reconceptualization of the essential task of our time.
  • The alignment problem from a deep learning perspective
    Richard Ngo, Lawrence Chan, Sören Mindermann
    Within the coming decades, artificial general intelligence (AGI) may surpass human capabilities at a wide range of important tasks. We outline a case for expecting that, without substantial effort to prevent it, AGIs could learn to pursue goals which are undesirable (i.e. misaligned) from a human perspective. We argue that if AGIs are trained in ways similar to today’s most capable models, they could learn to act deceptively to receive higher reward, learn internally-represented goals which generalize beyond their training distributions, and pursue those goals using power-seeking strategies. We outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world, and briefly review research directions aimed at preventing this outcome.
  • Despite the renewed interest in this concern, there remains substantial disagreement over both the nature and the likelihood of the existential threats posed by AI. Hence, our aim in this chapter is to explicate the main arguments that have been given for thinking that AI does pose an existential risk, and to point out where there are disagreements and weakness in these arguments. The chapter has the following structure: in §2, we will introduce the concept of existential risk, the sources of such risks, and how these risks are typically assessed. In §3–5, we will critically examine three commonly cited reasons for thinking that AI poses an existential threat to humanity: the control problem, global disruption from an AI “arms race”, and the weaponization of AI. Our focus is on the first of these three, because it represents a kind of existential risk that is novel to AI as technology. While the latter two are equally important, they have commonalities with other kinds of technologies (e.g., nuclear weapons) discussed in the literature on existential risk, and so we will dedicate less time to them.
  • This is a summary of a commonly cited argument for existential risk from superhuman artificial intelligence (AI): that advanced AI systems will tend to pursue goals whose satisfaction would be devastatingly bad by the lights of any human, and that these AI systems will have the competence to achieve those goals. The argument appears to be suggestive but not watertight.

    (This is well-paired with Katja Grace’s summary of counterarguments.)

  • I’ve been citing "AGI Ruin: A List of Lethalities" to explain why the situation with AI looks lethally dangerous to me. But that post is relatively long, and emphasizes specific open technical problems over "the basics". Here are 10 things I’d focus on if I were giving "the basics" on why I’m so worried
  • I will argue in this post that humanity is more likely than not to be taken over by misaligned AI if the following three simplifying assumptions all hold: (1) The "racing forward" assumption: AI companies will aggressively attempt to train the most powerful and world-changing models that they can, without "pausing" progress before the point when these models could defeat all of humanity combined if they were so inclined. (2) The "HFDT scales far" assumption: If HFDT is used to train larger and larger models on more and harder tasks, this will eventually result in models that can autonomously advance frontier science and technology R&D, and continue to get even more powerful beyond that; this doesn’t require changing the high-level training strategy itself, only the size of the model and the nature of the tasks. (3) The "naive safety effort" assumption: AI companies put substantial effort into ensuring their models behave safely in "day-to-day" situations, but are not especially vigilant about the threat of full-blown AI takeover, and take only the most basic and obvious actions against that threat.
  • Modeling Transformative AI Risks (MTAIR) Project -- Summary Report
    Sam Clarke, Ben Cottier, Aryeh Englander, Daniel Eth, David Manheim, Samuel Dylan Martin, Issa Rice
    This report outlines work by the Modeling Transformative AI Risk (MTAIR) project, an attempt to map out the key hypotheses, uncertainties, and disagreements in debates about catastrophic risks from advanced AI, and the relationships between them. This builds on an earlier diagram by Ben Cottier and Rohin Shah which laid out some of the crucial disagreements ("cruxes") visually, with some explanation. Based on an extensive literature review and engagement with experts, the report explains a model of the issues involved, and the initial software-based implementation that can incorporate probability estimates or other quantitative factors to enable exploration, planning, and/or decision support. By gathering information from various debates and discussions into a single more coherent presentation, we hope to enable better discussions and debates about the issues involved. The model starts with a discussion of reasoning via analogies and general prior beliefs about artificial intelligence. Following this, it lays out a model of different paths and enabling technologies for high-level machine intelligence, and a model of how advances in the capabilities of these systems might proceed, including debates about self-improvement, discontinuous improvements, and the possibility of distributed, non-agentic high-level intelligence or slower improvements. The model also looks specifically at the question of learned optimization, and whether machine learning systems will create mesa-optimizers. The impact of different safety research on the previous sets of questions is then examined, to understand whether and how research could be useful in enabling safer systems. Finally, we discuss a model of different failure modes and loss of control or takeover scenarios.
Bookmark the permalink.

Leave a Reply

Required fields are marked with a *. Your email address will not be published.

Contact me if the spam filter gives you trouble.

Basic HTML tags like ❮em❯ work. Type [latexpage] somewhere to render LaTeX in $'s. (Details.)