Just over a year ago, OpenAI, an artificial intelligence company based in San Francisco, amazed the world by demonstrating a significant advancement in computer-generated natural-language sentences. Not only could the computers complete sentences and answer questions, but they could also generate lengthy passages of text that closely resembled human writing.
The latest work from the OpenAI team, known as GPT-3, indicates the evolution of their thinking. This new creation, developed by some of the same authors as the previous version, including Alec Radford and Ilya Sutskever, along with collaborators from Johns Hopkins University, boasts additional features and improvements.
GPT-3 is now an immensely powerful language model, capable of digesting and analyzing two orders of magnitude more text compared to its predecessor.
While the focus has been on scaling up the power of language models, the OpenAI team acknowledges that there may be limits to this approach. Merely increasing the model's computational capabilities and feeding it more text may not guarantee better results. This recognition is a notable admission within a paper that primarily celebrates the achievement of utilizing more computing power to address problems.
To fully appreciate the significance of the authors' conclusion, it is necessary to understand the context. OpenAI's language model research has followed a steady progression, with each iteration becoming more successful as the technology grew in size.
The original GPT and its successor, GPT-2, were adaptations of a model called the Transformer, which was initially developed by Google in 2017. The Transformer employed attention mechanisms to calculate the probability of word appearance based on surrounding words. OpenAI drew attention when it decided against releasing the source code for the largest version of GPT-2, citing concerns about potential misuse and dissemination of fake news.
The new paper takes GPT to the next level by further increasing its size. The largest version of GPT-2 consisted of 1.5 billion parameters, while GPT-3 boasts a staggering 175 billion parameters. Parameters in a neural network control the weighting of data, giving prominence to certain aspects and shaping the network's learned perspective.
Increased parameter weights have consistently produced impressive benchmark results for the GPT family of programs, as well as other Transformer derivatives like Google's BERT. Despite criticism suggesting that these language models lack a deep understanding of language, their exceptional performance on tests remains significant.
GPT-3 represents quantitative progress, leveraging the massive Common Crawl dataset, which comprises nearly a trillion words scraped from the web, for training. With 175 billion parameters, GPT-3 achieves what the authors refer to as "meta-learning." This means that the GPT neural network can perform tasks such as sentence completion without requiring specific re-training. Given an example of an incomplete sentence and its completed counterpart, GPT-3 can effectively complete any given incomplete sentence.
GPT-3's ability to achieve this level of performance with a single prompt surpasses that of fine-tuned Transformer versions tailored to specific tasks. Thus, GPT-3 demonstrates the triumph of overarching generality. By exposing it to an extensive body of text, its parameter weights become ideal, enabling it to perform remarkably well across various specific tasks without further development.
However, the conclusion of the new paper reveals some shortcomings. Despite notable improvements compared to GPT-2, GPT-3 still exhibits weaknesses. For instance, it struggles to achieve significant accuracy in Adversarial NLI, a test that assesses a program's ability to determine the relationship between two sentences. The authors acknowledge that GPT-3's performance in such tasks is only slightly better than chance, and they are uncertain as to why these limitations persist despite the increased computational power.
Consequently, the authors propose that the current approach of predicting language outcomes may be flawed. They suggest that language systems, like virtual assistants, should focus less on making predictions and instead prioritize goal-directed actions. How they plan to pursue this new direction remains a topic for future exploration.
Despite the realization that bigger might not always be better, the improved performance of GPT-3 across multiple tasks is likely to fuel the desire for even larger neural networks. With 175 billion parameters, GPT-3 currently rules as the largest neural network. However, the machine learning community is already envisioning future networks with over one trillion parameters, as demonstrated in a presentation by AI chip company Tenstorrent in April.
For now, bigger and bigger language models will continue to represent the cutting edge in machine learning.