Building AGI Using Language Models | Leo Gao

One-liner

While GPT-3 and potential future GPT-x models impressively emulate human writing, they do not equate to Artificial General Intelligence (AGI), but insights gained from language models may offer a pathway to developing proto-AGI by harnessing their implicit world models.

Key insights

GPT-3’s Limitations and Potential

GPT-3, despite its advanced capabilities, is not AGI. It lacks memory of past interactions and the ability to pursue goals or maximize utility. Its primary function is to predict text based on maximizing the likelihood of natural language data. As models approach the Shannon entropy of natural language, becoming indistinguishable from human text, further improvements focus on finer semantic and logical consistency—areas where a world model becomes beneficial for prediction.

The Road to Proto-AGI

Theoretically, a sufficiently advanced language model like GPT-3 could serve as a proto-AGI by leveraging its language-formulated world model. This world model emerges since understanding and predicting language inherently requires some grasp of human behavior and world knowledge. As model size increases and performance enhances, the hope is that these models will reach a level of world modeling comparable to human performance on the internet, which is significantly competent.

Turning World Models into Agents

Language models can simulate world models but don't constitute agents. To transition a world model into an AGI agent, one must define a goal (e.g., "maximize number of paperclips") and utilize the language model to determine actions leading to goal fulfillment. Challenges arise in accurately determining the consequences of potential actions. The proposed solution involves using Monte Carlo Tree Search to simulate outcomes and guide decisions. To execute abstract actions, the language model could generate detailed instructions, akin to Hierarchical Reinforcement Learning.

Prerequisites for Real-World Interaction

To interact with the real world, the AGI would use an input module to convert various types of data into a natural language format compatible with the agent's thinking process. Then, the language model could convert the decided-upon natural language actions into executable commands. This system theoretically allows for absorbing external information, contemplating responses, and expressing actions through language.

Key quotes

  1. "Natural language essentially encodes information about the world—the entire world, not just the world of the Goban, in a much more expressive way than any other modality ever could."
  2. "By harnessing the world model embedded in the language model, it may be possible to build a proto-AGI."
  3. "The biggest and most likely to be wrong assumption that I’m making is that larger models will develop better world models."
  4. "A world model alone does not an agent make, though. So what does it take to make a world model into an agent?"
  5. "This is more a thought experiment than something that’s actually going to happen tomorrow; GPT-3 today just isn’t good enough at world modelling."

Make it stick

  1. The Model-World Analogy: Just as mastering a game's rules is crucial before strategizing a win, perfecting language prediction in GPT models is a precursor to constructing a world model for AGI.
  2. The Shannon Entropy Benchmark: Stepping stone to AGI—language models reaching human-like unpredictability in text, measured by the lowest entropy they can possibly achieve.
  3. AGI Recipe: Define a goal, probe a world model for action guidance, predict outcomes with Monte Carlo Tree Search, and express calculated actions in a universally understandable code: natural language.
This summary contains AI-generated information and may be misleading or incorrect.