A brain displayed with glowing blue lines.

Featured Image. Credit CC BY-SA 3.0, via Wikimedia Commons

Suhail Ahmed

Defining AGI: Why Benchmarks Keep Failing and Investors Keep Betting

AI Benchmarks, AI Ethics, AI Investment, Artificial General Intelligence, Future of AI, Microsoft & OpenAI

Suhail Ahmed

Artificial general intelligence (AGI) is the ultimate goal of AI. It is a machine that can think, learn, and adapt like a person in any field. Even though billions of dollars have been spent on it and there is constant hype, no one can agree on what AGI really is. Microsoft and OpenAI used to work together closely, but now they are in a bitter fight over the definition. Reports say that their contract ties AGI to a random $100 billion profit threshold. This ridiculous standard points to a bigger issue: AGI is a moving target, and even the best AI researchers admit that it is hard to pin down exactly what it is.

Google DeepMind said that if you ask 100 experts to define AGI, you’ll get 100 different answers. Some people think of it as an economic force, while others think of it as a spiritual breakthrough. Not being able to agree on something isn’t just a problem for academics; it also affects investments, regulations, and how the public sees things. If we can’t define AGI, how can we tell if we’re getting closer to it? And why do big tech companies keep putting billions on a goal they can’t even explain?

The AGI Definition Wars: From Human Parity to Mystical Chants

two hands touching each other in front of a pink background
Image by Igor Omilaev via Unsplash

The most common definition of AGI is an AI that can do everything that a human can do, including generalizing knowledge, taking on new tasks, and performing at human levels in a wide range of fields. But this brings up some difficult questions right away: What level of human? A skilled surgeon? A normal poet? A winner of the Nobel Prize? No one person is good at everything, so why should AGI be?

OpenAI’s charter says that AGI is systems that “outperform humans at most economically valuable work.” This is a vague, profit-driven measure. In the meantime, former OpenAI Chief Scientist Ilya Sutskever led employees in chants of “Feel the AGI!” treating it like a spiritual awakening instead of a technical standard. Dario Amodei, the CEO of Anthropic, doesn’t like the word at all. He says it’s “imprecise” and prefers “Expert-Level Science and Engineering.”

This mess of definitions isn’t just philosophical; it’s also contractual. Microsoft and OpenAI have a $13 billion deal that lets OpenAI limit Microsoft’s access to its tech once AGI is reached. But how can such a clause be enforced if no one agrees on what AGI means?

A Brief History of Moving Goalposts

a computer chip with the letter a on top of it
Image by Igor Omilaev via Unsplash

The search for AGI has always been hampered by changing definitions. Herbert A. Simon, a pioneer in AI, said in 1965 that machines would be able to do “any work a man can do” in 20 years. The goalposts moved when that didn’t happen. The Turing Test used to be the best way to tell if a computer was intelligent, but now chatbots can fake human conversation without really understanding it.

In the 2000s, AGI was redefined as “most economically valuable tasks.” Some people say we’ve already reached AGI if we only look at certain benchmarks, while others say it’s still a long way off. Companies can say they are making progress without actually developing true general intelligence because there is no set definition.

The $100 Billion Benchmark: When Profit Becomes a Proxy for Intelligence

Close-up of a smartphone displaying ChatGPT app held over AI textbook.
Image by Sanket Mishra via Pexels

The reported Microsoft-OpenAI AGI threshold of $100 billion in profit shows how badly economics has messed up the definition. Profit is a sign of market power, not intelligence. Does an AI system “think” like a person if it makes money by automating customer service or stock trading? No way. But this measure stays because it’s easier to measure than intelligence itself.

Satya Nadella, the CEO of Microsoft, has called AGI benchmarks “nonsensical,” but investors keep giving money to the dream. What happened? A cycle that keeps going on its own: hype leads to funding, which leads to more hype, even if real AGI isn’t possible.

Why AGI Benchmarks Keep Failing

Abstract green matrix code background with binary style.
Image by Markus Spiske via Pexels

Attempts to set objective AGI benchmarks have repeatedly failed. François Chollet made the Abstraction and Reasoning Corpus (ARC-AGI) to test new problem-solving skills that AI doesn’t have yet. But this standard isn’t perfect either because intelligence isn’t just one number. Reasoning, creativity, and adaptability are all parts of human cognition, but they don’t fit into a standardized test very well.

Data contamination is a bigger problem. If AI models remember test answers from their training data, they can pass benchmarks without really understanding them. Large language models (LLMs) are great at finding patterns, but they have trouble with real reasoning. Benchmarks will still not be good enough until they can tell how AI solves problems, not just if it gets the right answer.

The Expert Divide: Is AGI Decades Away Or Already Here?

An elderly scientist contemplates a chess move against a robotic arm on a chessboard.
Image by Pavel Danilyuk via Pexels

Even though there is a lot of hype, most AI researchers still don’t believe that AGI will happen anytime soon. A survey by AAAI in 2025 found that 76% of people think that scaling up current AI methods won’t lead to AGI.Updates will always be an issue. A report from two years ago revealed experts shortened their estimated predictions for AGI by 13 years due to newfound advances in AI technologies.

According to AI researcher Dwarkesh Patel, AI technologies won’t ease into learning like people do for another seven years. Others, like Sam Altman of OpenAI, firmly believe that AGI is almost here. The inability to come to a common consensus harms more than just academic discourse; it has an impact on policies, investments, and erodes public trust.

The Real Cost of AGI Hype

The AGI definition crisis has effects in the real world:

  • Misallocation of investors: Billions of dollars go to startups that promise “AGI-like” systems that are really just repackaged narrow AI.
  • Confusion in Regulation: Governments have a hard time making laws about a technology that no one can define.
  • People don’t trust AI because it makes promises it can’t keep.

If we don’t have clear definitions, we could see the same thing happen again as in the 1970s AI winter, when funding dried up because the hype outpaced the reality.

Beyond AGI: A Smarter Way to Measure AI Progress

A robotic hand reaching into a digital network on a blue background, symbolizing AI technology.
Image by Tara Winstead via Pexels

We should focus on specific abilities instead of chasing an undefined AGI milestone:

  • Can AI pick up new tasks without having to retrain?
  • Can it explain why it thinks what it does?
  • Can it work safely without people watching it?

The 5-level AGI framework from Google DeepMind (from emerging to superhuman) is a step in the right direction, but it might be too strict. Intelligence isn’t a yes or no question; it’s a range.

What’s the truth? AGI might not be real. If we focus too much on a vague goal, we might miss real AI progress that helps people. We’ll keep chasing a mirage until we all agree on what AGI means or stop using the word altogether.

Sources:

Leave a Comment