The Future of AI and Language Models: Matching Data and Compute Power Is Key
Scaling AI requires increasing compute power and data in tandem, says OpenAI’s Ilya Sutskever.
Deep learning’s academic branch has underestimated data’s importance because the information is usually presented in the form of a benchmark about a fixed dataset. A better method is needed than what currently exists, according to Ilya Sutskever, co-founder and chief scientist at OpenAI, which aims to build artificial general intelligence that benefits all of humanity.
Sutskever spoke with Alexandr Wang, CEO and founder of Scale AI, during a fireside chat at the TransformX conference. They discussed the progress being made in AI and where innovation will come from. Sutskever believes that huge progress is possible from algorithmic and methodological improvements. “We are nowhere close to being as efficient as we can be with our compute,” he said, while acknowledging that there’s been a lot of advancement.
By now, he added, it’s been proven that domains with a lot of data will experience a lot of progress. Further, a really good prediction requires having a meaningful degree of understanding using whatever data is given to a model.
This kind of thinking has led OpenAI to experiment with different approaches to try to predict things well, such as the next word or the next pixel, and then study their properties. About four years ago, OpenAI invented the “sentiment neuron,” a small neural net that aimed to predict the next character in reviews of Amazon products.
It proved the principle that if you predict the next character well enough, you will eventually start to discover the semantic properties of the text, said Sutskever.
The Limits of Scaling
With its roots in academia, machine learning has traditionally followed this approach: Someone builds something and creates a fixed benchmark using a dataset with certain characteristics, and then people compare their methods on the dataset. However, that forces everyone to work with the same fixed dataset.
OpenAI has released a generative pre-training (GPT) model that has allowed the company to produce better results using a larger architecture. And in some domains, such as language, there is quite a bit of valuable data. In more specialized subdomains, the amount of data is a lot smaller.
GPTs have shown that scaling requires increasing the compute and the data in tandem. Whenever data is abundant, it's possible to apply deep learning and produce increasingly more powerful models.
Wang noted that as compute power becomes more efficient, there’s also more efficiency at generating and finding data and producing better algorithms, which will help people to keep doing incredible things.
For his part, Sutskever isn’t concerned about continued progress as computers become faster and engineers can train better models. Instead, to continue to progress, people need to be more creative about using compute power to compensate for a lack of data.
To get something akin to a Moore’s law for data will require either improving methods so you can do more with the same data or doing the same with less data. Both will be needed to make the most progress, Sutskever said. Another option is to increase the efficiency of the teachers, he said,
The Future of Codex
Wang and Sutskever also discussed expectations for Codex, a large GPT neural network trained on code. The goal is for Codex to be able to predict the next word in code. Sutskever said most people are not aware that it will be possible to train a neural network so that, if it is given some representation of text describing what you want, it will be able to process the text and produce correct code.
This is exciting because code is a domain that hasn't fully been touched by AI yet, and the approach focuses on reasoning and carefully laying out plans, which is where today's deep learning has been perceived as weak, he said.
One distinction between Codex and language models is that the former can, in effect, control the computer, with the computer acting as an actuator. This makes the Codex models much more useful to programmers, especially in instances where they need to know random APIs, Sutskever said.
Other Advancements
Other recent advancements from OpenAI include CLIP and DALL-E, neural networks that learn to associate text with images. DALL-E associates text with images in the generative direction. CLIP associates text with images in perception—going from an image to text, versus going from text to image.
The real motivation with CLIP and DALL-E was to “dip our toes into ways of combining the two modalities,” because people want to have a neural network that understands the visual world, Sutskever said. The goal is that, by connecting the textual world to the visual world, these neural networks will have a better understanding of text that comes a little closer to humans.
In terms of advances to neural networks, Sutskever believes the business-as-usual, “mundane progress we've seen over the past few years will continue.” However, he said, he expects language and vision models, as well as text to speech and speech to text, to improve.
Deep learning will continue expanding, and there will be a lot more deep learning data centers. “We’ll see lots of interesting neural networks trained on all kinds of tasks,” he said. Progress in medicine and biology will be especially exciting, he said.
AI is a very powerful technology that can work on all kinds of applications to solve real problems, Sutskever said. But people should also work on methods to try to address the problems that exist with the technology, such as bias and desirable outputs. Also, whenever possible, he said, they should work on reducing real harms.
Learn More
For more about Sutskever’s predictions about deep learning, watch his fireside chat, “What's Next for AI Systems & Language Models With Ilya Sutskever of OpenAI,” and read the full transcript here.