While GPT-3 and potential future GPT-x models impressively emulate human writing, they do not equate to Artificial General Intelligence (AGI), but insights gained from language models may offer a pathway to developing proto-AGI by harnessing their implicit world models.
GPT-3, despite its advanced capabilities, is not AGI. It lacks memory of past interactions and the ability to pursue goals or maximize utility. Its primary function is to predict text based on maximizing the likelihood of natural language data. As models approach the Shannon entropy of natural language, becoming indistinguishable from human text, further improvements focus on finer semantic and logical consistency—areas where a world model becomes beneficial for prediction.
Theoretically, a sufficiently advanced language model like GPT-3 could serve as a proto-AGI by leveraging its language-formulated world model. This world model emerges since understanding and predicting language inherently requires some grasp of human behavior and world knowledge. As model size increases and performance enhances, the hope is that these models will reach a level of world modeling comparable to human performance on the internet, which is significantly competent.
Language models can simulate world models but don't constitute agents. To transition a world model into an AGI agent, one must define a goal (e.g., "maximize number of paperclips") and utilize the language model to determine actions leading to goal fulfillment. Challenges arise in accurately determining the consequences of potential actions. The proposed solution involves using Monte Carlo Tree Search to simulate outcomes and guide decisions. To execute abstract actions, the language model could generate detailed instructions, akin to Hierarchical Reinforcement Learning.
To interact with the real world, the AGI would use an input module to convert various types of data into a natural language format compatible with the agent's thinking process. Then, the language model could convert the decided-upon natural language actions into executable commands. This system theoretically allows for absorbing external information, contemplating responses, and expressing actions through language.