AI goalpost moving is not unreasonable

[Summary: Constantly evolving tests for what counts as worryingly powerful AI is mostly a consequence of how hard it is to design tests that will identify the real-world power of future automated systems. I argue that Alan Turing in 1950 could not reliably distinguish a typical human from an appropriately-fine-tuned GPT-4, yet all our current automated systems cannot produce growth above historic trends.A draft of this a  ]

What does the phenomena of “moving the goalposts” for what counts as AI tell us about AI?

It’s often said that people repeatedly revising their definition of AI, often in response to previous AI tests being passed, is evidence that people are denying/afraid of reality, and want to put their head in the sand or whatever. There’s some truth to that, but that’s a comment about humans and I think it’s overstated.

Closer to what I want to talk about is the idea AI is continuously redefined to mean “whatever humans can do that hasn’t been automated yet”, often taken to be evidence that AI is not a “natural” kind out there in the world, but rather just a category relative to current tech. There’s also truth to this, but not exactly what I’m interested in.

To me, it is startling that (I claim) we have systems today that would likely pass the Turing test if administered by Alan Turing, but that have negligible impact on a global scale. More specifically, consider fine-tuning GPT-4 to mimic a typical human who lacks encyclopedic knowledge of the contents of the internet. Suppose that it’s mimicking a human with average intelligence whose occupation has no overlap with Alan Turing’s expertise. Now if you put Turing in communication with ChatAverageJoe and ask him to determine whether it’s a machine or human without any info about modern AI progress, examples of past behavior, or previous attempts by others testers, I think it’s quite likely he’d not be able to distinguish it from a human even after a fairly lengthy conversation.

Without pretending to anywhere near as smart as Turing, I would guess he would go about probing the AI armed only with a small subset of the tests/tricks that the field of AI researchers have developed over decades of trial and refinement. Turing would have to independently invent them without the benefit of this rich history.

A natural objection is that even laymen can start to notice ChatGPT’s deficiencies fairly quickly, but I suspect this is largely a combination of (a) they don’t begin from a position of real uncertainty, (b) ChatGPT attempts to be way more broadly knowledgable than a human, and (c) most modern people have an intuition for distinguishing computer-like behavior from human-like behavior that they get from everyday tasks like interacting with an automated phone system. (I think there’s probably a curse-of-knowledge thing here where you can’t even remember what it’s like to not have a gut instinct for some of this stuff.)

If this is true, it’s striking that current AI systems, inclusive of ChatGPT, are definitely not “powerful” in the sense of Transformative AI, i.e., inducing global economic growth significantly above historical rates.I’m not making a claim here about whether we are temporally close to TAI, nor am I saying that the economy would be growing just as fast without all our current automated systems; I’m just saying that economic growth with current systems is well within historical trends, and would remain so if tech development were frozen.b  

Now, even if I haven’t convinced you that ChatAverageJoe would pass the Turing-the-man test, I hope this at least gestures at the idea that coming up with an operational threshold years in advance for when AI will be powerful is super hard. I don’t think Alan Turing could do it, so what chance do us mortals have? The refusal to commit oneself to the best threshold one can think of at the time is not necessarily head-in-the-sand behavior.

Of course, relative to Turing, we do have the serious advantage of having access to a half century of technological progress and analysis, but this just goes to my point: identifying what automated systems are likely to be powerful is still an iterative “I’ll know it when I see it” kind of process that resists clean tests.

Footnotes

(↵ returns to text)

  1. A draft of this originally appeared on Twitter.
  2. I’m not making a claim here about whether we are temporally close to TAI, nor am I saying that the economy would be growing just as fast without all our current automated systems; I’m just saying that economic growth with current systems is well within historical trends, and would remain so if tech development were frozen.
Bookmark the permalink.

2 Comments

  1. I think the so-called Turing Test for AI should really be a more full version of Turing’s “imitation game”, and that’s exactly what’s happening organically.

    Philosophically, Turing’s core idea is very sound: We humans infer that other humans are, well, people like us because they look & appear to be behave as we do. We can’t actually tell if they have an inner self-awareness like we do; the best we can do is infer or assume it based on other evidence. Hence the hard issues that arise with people who’re paralyzed or “locked-in”- it is effectively impossible in some cases to tell if they’re still “in there” (conscious) or not.

    So in a very real sense the true “Turing Test” is the ability to distinguish a machine trying to imitate humans from actual humans. And this test is therefore really a measurement of a gap- the delta between how good a machine is at ‘acting human’ vs. how good the average human is at spotting a fake (the machine). Sort of like a predator distinguishing real monarch butterflies from tasty imitators like viceroys. It is the space between 2 moving goalposts, and an AI won’t ever “pass” until there’s a moment in time where that delta hits zero.

    • I believe “imitation game” was just Turning’s name for what is now called the Turing test.

      I’m arguing in this post that we may be upon machines that can pass any reasonable definition of the Turing test without actually having transformative economic impact, which is the opposite of what one might have expected to happen.

Leave a Reply

Required fields are marked with a *. Your email address will not be published.

Contact me if the spam filter gives you trouble.

Basic HTML tags like ❮em❯ work. Type [latexpage] somewhere to render LaTeX in $'s. (Details.)