Ilya Sutskever is Co-founder and Chief Scientist of OpenAI, which aims to build artificial general intelligence that benefits all of humanity. He leads research at OpenAI and is one of the architects behind the GPT models. Prior to OpenAI, Ilya was co-inventor of AlexNet and Sequence to Sequence Learning, as well as AlphaGo and TensorFlow. He earned his B.Sc, M.Sc, and Ph.D in Computer Science from the University of Toronto.
Ilya Sutskever is Co-founder and Chief Scientist of OpenAI, which aims to build artificial general intelligence that benefits all of humanity. He leads research at OpenAI and is one of the architects behind the GPT models. Prior to OpenAI, Ilya was co-inventor of AlexNet and Sequence to Sequence Learning, as well as AlphaGo and TensorFlow. He earned his B.Sc, M.Sc, and Ph.D in Computer Science from the University of Toronto.
Ilya Sutskever is Co-founder and Chief Scientist of OpenAI, which aims to build artificial general intelligence (AGI) that benefits all of humanity. He is one of the world’s most respected, and cited, experts in machine learning. Ilya leads research at OpenAI and is one of the architects behind GPT models. He joins Scale AI CEO Alexandr Wang in a fireside chat to discuss recent developments in AI and what the future holds for language models and beyond.
Alexandr Wang (00:23): I'm excited to welcome our next speaker Ilya Sutskever. Ilya is co-founder and chief scientist at OpenAI, which aims to build artificial general intelligence that benefits all of humanity. Ilya leads all of research at OpenAI and was one of the architects behind the GPT models, and has been cited over 260,000 times.
Alexandr Wang (00:43): Prior to OpenAI, Ilya was the co-inventor of groundbreaking advances that push the boundaries of AI, like AlexNet, Sequence to Sequence Learning, as well as AlphaGo and TensorFlow. He earned his bachelor's, master's and PhD in computer science from the University of Toronto.
Alexandr Wang (01:02): Thank you so much for joining us, Ilya. I'm super excited to talk to you today.
Ilya Sutskever (01:05): Yeah, thanks for inviting me.
Alexandr Wang (01:06): Yeah. I think we'll talk about a lot of interesting things over the course of this session. And there's a lot of interesting advancements in ML and AI recently that OpenAI has really pioneered. So I guess, to start off with, you really have been one of the world's sort of leading researchers in machine learning and have driven many breakthroughs over the course of the past decade plus in the field. And I actually just taking a big step back, I'm curious what originally attracted you to machine learning AI? And sort of how have your aspirations for the field kind of evolved over time?
Ilya Sutskever (01:41): I definitely was interested in AI from a fairly early age. I can't explain necessarily why but it felt very interesting and very exciting. And I think I was fortunate to realize fairly early on, as I was looking into what has been done in AI, that learning is something that's both really important for intelligence, and something that we had no idea how to do at all. And so when my family moved to Canada, I remember the first thing I did was to go to the Toronto Public Library and try to find a book on machine learning.
Alexandr Wang (02:23): That's right, how old were you?
Ilya Sutskever (02:24): I was 16.
Alexandr Wang (02:25): 16.
Ilya Sutskever (02:26): Yeah. And then when I went to the University of Toronto, and I sought out machine learning professors, I found Geoff Hinton, I discovered neural networks, and neural networks felt like the right thing because it's a very different way of writing code. Normally, you write code and you can kind of think it through and understand, whereas a neural network it's this, you write an equation, a complicated equation inside a loop, and then you run the loop. And good luck figuring out what it does precisely and that connects to neural nets on being interpretable. But it could also argue that the difficulty of understanding what neural networks do is not a bug but it's feature.
Ilya Sutskever (03:11): Like, we want to build intelligence, intelligence is not simple to understand. We can't explain how we do the cognitive functions that we do, how we see, how we hear, how we understand language. So therefore if computers can produce objects that are similarly difficult to understand, not impossible, but similarly difficult, it means we're on the right track. And so all those things helped me converge on neural networks fairly early on.
Alexandr Wang (03:43): What year was it when you sort of remember initially getting excited about neural networks and being pretty convicted?
Ilya Sutskever (03:49): Like early 2000s. I started working with Geoff Hinton in 2003 so quite a while ago now.
Alexandr Wang (03:54): Long before I mean, obviously the craze kind of started around 2010.
Ilya Sutskever (03:58): That's right.
Alexandr Wang (04:01): I think this is a common theme whenever you look at sort of anybody who works like in any field that becomes very big, but there's a long stretch of like wandering in the desert, maybe is one way to put it.
Ilya Sutskever (04:12): Yeah, I mean, definitely lots of perseverance is required, because you don't know how long you want to stay in the desert. You just got to endure and that's very helpful.
Alexandr Wang (04:23): And did you expect like, I mean, obviously, today, neural networks do some pretty incredible things like did you expect back in 2003 or early 2000s that like in your lifetime you would see sort of the things that we're seeing now with AI machine learning.
Ilya Sutskever (04:38): I was hoping, but I did not expect it. Back then the field of AI was on the wrong track. It was in a mindset of rejection of neural networks, and the reason for that is that neural networks are difficult to reason about mathematically while other stuff you can prove theorems about. And there's something very seductive and dangerous about proving theorems about things. Because it's a way to showcase your skill but it's not necessarily aligned with what makes the most progress in the field.
Ilya Sutskever (05:15): But I think that neural networks are as successful as they are precisely because they're difficult to reason about mathematically. And so anyway my earlier hope was to simply convince the field that they should work on neural networks rather than the other stuff that they were doing. But then when the computer started to get fast, then my level of excitement about their potential has increased as well.
Alexandr Wang (05:38): Yeah. And so what are your aspirations today? Like, in your lifetime what's the thing you... I mean, I think it's obvious from the OpenAI mission set but-
Ilya Sutskever (05:46): Exactly right. Now the hopes are much larger now, I think we can really try to build not only really powerful and useful AI, but actually AGI, make it useful, make it beneficial, use it to solve and make it so that it will be used to solve a large number of problems and create lots of amazing applications. That's what I hope to see happen.
Alexandr Wang (06:14): And then obviously along the way, you had been doing a lot of this research and doing a lot of groundbreaking work at Google. And then you sort of left and started OpenAI with Sam Altman and Greg Brockman and a bunch of others. What were kind of goals with starting OpenAI at the outset? What was sort of the initial conception and the initial vision and what did you hope to accomplish by starting sort of a new lab?
Ilya Sutskever (06:42): So there were multiple motivations on my end for starting OpenAI. The first motivation was that I felt that the way to make the most progress in AI was by merging science and engineering into a single whole, into a unit to make it so there is no distinction or as little distinction as possible between science and engineering. So that all the science is infused with engineering, discipline, and careful execution. And all the engineering is infused with the scientific ideas. And the reason for that is because the field is becoming mature, and so it is hard to just do small scale tinkering without having a lot of engineering skill and effort to really make something work.
Ilya Sutskever (07:37): So that was one motivation, I really wanted to have a company that will be operating on this principle, and now the motivation was that I came to see AI technology in a more sober way. I used to think that AI will just be this endless good. And now I see it in a more complex way where things will be a lot of truly incredible, inconceivable applications that will improve our lives in dramatic ways. But I also think that there will be challenges, I think that there will be lots of problems that will be posed by the misapplication of AI and by it's peculiar properties that may be difficult for people to understand.
Ilya Sutskever (08:25): And I wanted a company that will be operating with this awareness in mind and it will be trying to address those challenges as best as possible by not only working on advancing the technology, but also working on making it safe. And also working on the policy side of things, as much as is rational and reasonable to make the whole be as useful and as beneficial as possible.
Alexandr Wang (08:50): Totally, I think it's something we agree on. I think one thing that is very obvious to me is that AI is something that is going to... Which countries have access to AI technology, and the ways in which they use them are going to define how the world plays out over the course of the next few decades. I think that's the path we're on as a world.
Ilya Sutskever (09:10): That's right, among many other things.
Alexandr Wang (09:14): This thing that you mentioned around sort of bringing together the science and engineering, I think it's quite profound, I think for a few reasons, because one is that first of all, I think a lot of the best, most incredible innovative things happen oftentimes from sort of blurring the lines between disciplines, like Apple was one of the best examples where from the very beginning, they were always like, "Hey, we're blending hardware and software and that's our special sauce." And obviously, it's produced some incredible things.
Alexandr Wang (09:43): And I think a lot of other research labs they operate in a very sort of scientists tell the engineers what to do mindset, which is counterproductive because you really need to understand both very well to understand what kind of the limits of the technology are.
Ilya Sutskever (09:56): Yeah, that's right. And on that point, you may even say isn't it obvious that the science and engineering should be together, and on some level it is. But it just so happens that historically it hasn't been this way, there's a certain kind of taste like empirically it has been the case in the past, less so now. But people who gravitate to research would have a certain taste that would also make them less drawn to engineering, and vice versa.
Ilya Sutskever (10:27): And I think now, because people are also seeing this reality on the ground to do any kind of good science, you need the good engineering, then you have more and more people who are strong in both of these axes.
Alexandr Wang (10:38): Totally. Yeah, and I think that, switching gears a little bit to kind of the GPT models like this is a great illustration. Because the GPT models are impossible without incredible engineering, that sort of... But yeah they still require novel research, they still require novel science to be able to accomplish. And they've obviously been some of the biggest breakthroughs in the field of AI as of late and sort of blown open many people's imaginations about what AI can accomplish, or at least increase people's confidence that AI can accomplish incredible things.
Alexandr Wang (11:10): I'm kind of curious about originally at OpenAI when you guys were... You've been working on these language models for some time, what were the original sort of research inspirations behind it, and what were the original sort of things that led you all to say, "Hey, this is something that's worth working on, worth scaling up, worth continuing to double down on.
Ilya Sutskever (11:28): So there have been multiple lines of thinking that lead us to converge on language models, there has been an idea that we believed in relatively early on that you can somehow link understanding to prediction. And specifically to prediction of whatever data you give to the model, where the idea is, well let's work out an example. So before diving into the example, I'll start with the conclusion first, the conclusion is that if you can make really good guesses as to what's going to come next, you can't make it perfectly, it's impossible. But if you can make a really good guess, you need to have a meaningful degree of understanding.
Ilya Sutskever (12:18): In the example of a book, suppose that you read a book and it's a mystery novel. And in the last chapter, all the pieces are coming together, and there is a critical sentence, and you start to read the first word and the second word, now you say, "Okay, the identity of some person is going to be relieved," and your mind is honing in on like it's either this person or that person, you don't know which one it is. Now, maybe someone who read the book and thought about it very carefully says, "You know, I think it's probably this person. Maybe that, but probably this." So what this example goes to show that really good prediction is connected to understanding.
Ilya Sutskever (12:56): And this kind of thinking has led us to experiment with all kinds of approaches of hey, can we predict things really well? Can we predict the next word? Can we predict the next pixel and study their properties. And through this line of work, we were able to get to... We did some work before the GPTs, before the Transformers were invented, and be the something that we call the sentiment neuron, which is a neural net, which was trying to predict the next word, sorry, the next character in reviews of Amazon products.
Ilya Sutskever (13:35): And it was a small neural net, because it was maybe four years ago. But it did prove the principle that if you predict the next character well enough, you will eventually start to discover the semantic properties of the text. And then with the GPTs we took it further, we said, "Okay, well, we have the Transformer, it's a better architecture, so we have a stronger effect." And then later, there was realization that if you make it larger, it will be better. So let's make it larger, and it will be better.
Alexandr Wang (14:05): Yeah, there's a lot of great nuggets in what you just mentioned. I think first is the elegance of this concept, which is like, "Hey, if you get really good at predicting the next whatever, get really good at prediction, that obligates you to be good at all these other things," if you're really good at that, and I think it's probably like underrated how that required some degree of vision because it's like, early on you try to get really good at predicting things and you get the sentiment neuron which is cool, but that's like a blip relative to what we obviously have seen with the large language models.
Alexandr Wang (14:43): And so that I think is significant and I think the other significant piece is what you just mentioned, which is kind of scaling it up, right? And I think you guys had released this paper about this kind of a scaling laws of what you found as you scaled up, compute data, model size, sort of in concert with one another. But I'm kind of curious, what's the... Obviously there's some intuition where it's just like, "Hey, scaling things up is good and you see great behaviors?" What's kind of your intuition behind sort of if you think from now over the next few years, or even the next few decades, like, what does scaling up mean? Why is it likely to continue resulting in great results and what do you think the limits are, if any?
Ilya Sutskever (15:31): The limits of scaling? I think two statements are true at the same time. On the one hand it does look like our models are quite large. Can we keep scaling them up even further? Can we keep finding more data for the scale up? And I want to spend a little bit of time on the data question because I think it's not obvious at all.
Alexandr Wang (15:53): Yeah.
Ilya Sutskever (15:58): Traditionally because of the roots of the field of machine learning, because the field has been fundamentally academic, and fundamentally concerned with discovering new methods, and less with the development of very big and powerful systems, the mindset has been someone builds, someone creates a fixed benchmark. So a data set of a certain shape of certain characteristics. And then different people can compare their methods on this data set. But what it does is that it forces everyone to work with a fixed data set.
Ilya Sutskever (16:38): The thing that the GPTs have shown in particular is that scaling requires that you increase the compute and the data in tandem at the same time. And if you do this then you keep getting better and better results. And in some domains, like language, there is quite a bit of data valuable. In other maybe more specialized subdomains the amount of data is a lot smaller. And that could be for example, if you want to have an automated lawyer. So I think your big language models will know quite a bit about language, and it will be able to converse very intelligently about many topics. But it may perhaps not be as good at being a lawyer as we'd like, it will be quite formidable, but we would be good enough. So this is unknown, because the amount of data there is smaller.
Ilya Sutskever (17:27): But anytime where data is abundant, then it's possible to apply the magic deep learning formula, and to produce these increasingly good and increasingly more powerful models. And then in terms of what are the limits of scaling? So I think one thing that's notable about the history of deep learning over the past 10 years is that every year people said, "Okay, we had a good run, but now we've hit the limits." And that happened year after year after year, and so I think that we absolutely may hit the limits at some point but I also think that it would be unwise to bet against deep learning.
Alexandr Wang (18:09): Yeah, there's a number of things I want to dig in here because they're all pretty interesting. One is this just I think this... You certainly have this mental model that I think is quite good, which is kind of like, hey Moore's Law is this incredible accelerant for everything that we do, right? And the more that there's Moore's law for everything, Moore's Law for different inputs that go into the machine learning lifecycle, we're just going to push all these things to the max, and we're going to see just incredible performance.
Alexandr Wang (18:41): I think it's significant because as you mentioned about this data point, it's like, hey if we get more efficient at compute, which is something that's happening, we get more efficient at producing data or finding data or we're generating data, we get more efficient... Obviously there's more efficiency out of the algorithms, all these things are just going to keep enabling us to do the next incredible thing and the next incredible thing and the next incredible thing.
Alexandr Wang (19:05): So first, I guess like we've talked about this a little bit before so I know you agree with that but what do you think is... Where do you think... Are there any flaws that logic, what would you be worried about in terms of how everything will scale up over the next few years?
Ilya Sutskever (19:22): I think over the next few years, I don't have too much concern about continued progress. I think that we will have faster computers, we will find more data and will train better models. I think that is... I don't see particular risk there. I think moving forward we will need to start being more creative about, "Okay, so what do you do when you don't have a lot of data?" Can you somehow intelligently use the same compute to compensate for the lack of data? And I think those are the questions that we and the field will need to grapple with to continue our progress.
Alexandr Wang (19:57): And I think this point about data, the other thing I want to touch on because this is something obviously at scale that we focus on. And I think that the large language models, thankfully because you can leverage the internet really like all the fact that all this datas existing have been accumulating for a while, you can show some pretty incredible things and all new domains you need efficient ways to generate lots of data. And I think that there's this whole question where is like, how do you make it so that each ounce of human effort that goes into generating some data produces as much data as possible.
Alexandr Wang (20:30): And I think that something that we're passionate about, that I think we talked a little bit about is how do you get like a Moore's law for data? How do you get more and more efficiency out of a human effort in producing data? And that might require novel new paradigms, but is something that I think is required for and this lawyer for example, that you mentioned, like we have a pretty finite set of lawyers? How do we get those lawyers to produce enough data so you can create some great legal AI?
Ilya Sutskever (20:59): Yeah, that's right. And so the choices that we have is either improve our methods so that we can do more with the same data or do the same the less data? And the second is like you say, somehow increase the efficiency of the teachers?
Alexandr Wang (21:20): Yep.
Ilya Sutskever (21:21): And I think both will be needed to make the most progress.
Alexandr Wang (21:23): Well, it's kind of, I really think Moore's law is instructive. To get these chips performing better, people try all sorts of random crap. And then the end output is that you have chips that have more transistors. And if we think about is like, do we have models that perform better with certain amounts of data, or certain amounts of teaching, how do we make that go up?
Ilya Sutskever (21:48): I'm sure that there will be ways to do that. I mean, for example, if you ask the human teachers to help you only in the hardest cases, I think that will allow you to move faster.
Alexandr Wang (22:01): I want to switch gears to one of the offshoots of the large language model efforts, which is particularly exciting, especially to me as an engineer, probably most people who spend a lot of time coding, which is Codex, which demonstrated some pretty incredible capabilities of going from sort of natural language to code and sort of being able to interact with a program in a very novel new way. I'm kind of curious for you, what excites you about this effort? What do you think are the reasonable expectations for what Codex and Codex like systems will enable in the next few years? What about far beyond that? And ultimately why are you guys so excited about it?
Ilya Sutskever (22:47): So for some context Codex is pretty much a large GPT neural network that's trained on code. Instead of training to predict the next word in text, it's trying to predict the next word in code. The next I guess token in code, and the thing that's cool about it is that it works at all. I don't think it's self evident to most people that it will be possible to train a neural net in such a way so that if you just give it some representation of text that describes what you want, and then the neural network will just process this text and produce code. And this code will be correct, and it will run.
Ilya Sutskever (23:45): And that's exciting for a variety of reasons. So first of all, it is useful, it is new, it shows that... I'd say code has been a domain that hasn't really been touched by AI too much, even though it's obviously very important. And it touches on aspects where AI has been, today's AI, deep learning has been perceived as weak, which is reasoning and carefully laying out plans and not being fuzzy.
Ilya Sutskever (24:19): And so it turns out that in fact, they can do quite a good job here, and like one analogy, one distinction between Codex and language models is that the Codex models, the code models, they allow you to... They in effect they can control the computer. It's like they have the computer as an actuator. And so it makes them much more useful. You can do so many more things with them and of course, we want to make them better still, I think they can improve in lots of different ways, those are just the preliminary code models.
Ilya Sutskever (24:59): I expect them to be quite useful to programmers and especially in areas where you need to know random API's. Because these neural networks they... So one thing that I think is a small digression, the GPT neural networks, they don't learn quite like people. A person will often have somewhat narrow knowledge in great depth. While these neural networks they want to know everything that exists, and they really try to do that. So their knowledge is encyclopedic. It's not as deep, it's pretty deep, but not as deep as a person.
Ilya Sutskever (25:39): And so because of that, these neural networks in the way they work today, they compliment people with their breadth. So you might say, "I want to do something with a library I don't really know." It could be some existing library, or maybe the neural network had read all the code of all my colleagues, and it knows what they've written. And so I want to use some library I don't know how to use, the network will have a pretty good guess of how to use it. You'd still need to make sure that what it said is correct. Because such is its level of performance today, you cannot trust it blindly especially if the code is important.
Ilya Sutskever (26:20): For some domains where it's easy to undo anything that it writes, any code that it writes, then I think you can trust it just fine. But if you actually want to have real code, you want to check it. But I expect that in the future those models will continue to improve, I expect that the neural network, that the code neural networks will keep getting better and I think the nature of the programming profession will change in response to these models. I think that in a sense it's a natural continuation of how in the software engineering world we've been using higher and higher level programming languages. First people wrote assembly, then they had Fortran, then they had C, now we have Python, now we have all these amazing Python libraries and that's a layer on top of that.
Ilya Sutskever (27:12): And now we can be a little bit more imprecise, we can be a little bit more ambitious, and the model of the neural network will do a lot of the work for us. And I do think that it is not, I should say, I expect something similar to happen across the board in lots of other white collar professions as well. If you think about the economic impact of AI, there's been an inversion, I think there's been a lot of thinking that maybe simple robotic stacks will be the first ones to be hit by automation. But instead of we're finding that the creative tasks, counter intuitively they seem to be affected quite a bit.
Ilya Sutskever (27:55): If you look at the generative neural networks, in the way you generate images now, you can find it on Twitter, all kinds of stunning images being generated, generating cool text that's happening as well, but the images are getting most of the attention. And then be things like code, things like a lot of writing tasks. This is the white collar task, they are also being affected by these AI's. And I do expect that society will change as progress continues to make society will change. And I think that it is important for economists and people who think about this question still pay careful attention to these trends. So that as technology continues to improve, there are good ideas in place to help us in effect to be ready for this technology.
Alexandr Wang (28:42): Yeah, there's a number of really, again, interesting nuggets in there. I think one is that, I think one of the big ideas behind Codex or Codex like models is that you go from being able to go from human language to machine language. And you kind of mentioned like, all of a sudden the machine is an actuator. And if you think about, I think many of us when we think about AI, we think of it like the Star Trek computer, you can just ask the computer, and it'll do things, this is a key enabling step. Because if all of a sudden you can go from how we speak, how human speak to things that a machine can understand, then you bridge this key translation step.
Alexandr Wang (29:21): So I think that's super interesting, another thing that this inversion that you just mentioned about is super interesting, because I think that one of the things that my beliefs on this is like, "Hey, this is the reason that some things have become much easier than others it's all a product of availability of data. There's some areas where we've had... There just exists lots and lots of digital data that you can kind of suck up into the algorithms and it can do quite well. And then in things like robotic tasks or setting a table or all these things that we've had a lot of trouble building machines to do you're like fundamentally limited by amount of data you have first just by the amount of data that's been collected so far, but also like you can only have so much stuff happening in the real world to collect that data. I'm curious, how do you think about that? Or do you think it's actually something intrinsic to the sort of like creative tasks, that is somehow more suited to current neural networks?
Ilya Sutskever (30:20): I think it's both. I think it is unquestionably true, that with the... We can take a step backwards. At the base of all AI progress that has happened at least in all of deep learning, and arguably more is the ability of neural networks to generalize. Now, generalization is a technical term, which means that you understand something correctly or take the right action in a situation that's unlike any situations that you've seen in the past, in your experience.
Ilya Sutskever (31:04): And you can see and so now a system generalizes better, if from the same data it can do the right thing or understand the right situation in a broader set of situations. And so to make an analogy, suppose you have a student at a university studying for an exam, that student might say this is a very important exam for me, let me memorize this... Let me make sure I can solve every single exercise in the textbook, such a student will be very well prepared, and could achieve a very, very high grade in the exam.
Ilya Sutskever (31:39): Now consider a different student who might say, "You know what, I don't need to know how to solve all the exercise in the textbooks as long as I got the fundamentals, I read the first 20 pages, and I feel I got the fundamentals." If that second student also achieves a high grade in the exam, that second student did something harder than the first student, that second student exhibited a greater degree of generalization, they were able to, even though the questions were the same, the situation was less familiar for the second student than the first student. And so our neural networks are a lot like the first students, they have an incredible ability to generalize for a computer but we could do more.
Ilya Sutskever (32:30): And because their generalization is not yet perfect, definitely not yet at a human level, we need to compensate for it by training on very large amounts of data. That's where the data comes in. The better you generalize the less data you need slash the further you can go with the same data. So maybe once we figure out how to make our neural networks generalize a lot better, then all those small domains we don't have a lot of data, it actually won't matter, the neural network will say, "It's okay, I know what to do well enough, even with this limited amount of data."
Ilya Sutskever (33:08): But today, we need a lot of data. But now, when it comes to the creative applications in particular, there is some way in which they are especially well suited for our neural networks. And that's because generative models play a very central role in machine learning. And the nature of the generations of generative models are somehow analogous to the artistic process. It's not perfect, it doesn't capture everything and very... And there is certain kinds of art which our models cannot do yet.
Ilya Sutskever (33:42): But I think this second connection, the generative aspect of art, and the ability of generative models to generate new plausible data is another reason why art we've seen so much progress in generative art.
Alexandr Wang (33:59): So yeah, it's a really interesting thing, because it's a shade of what you'd mentioned at the very beginning, which is that part of the reason that maybe we shied away from neural networks at the start was that they're so hard to explain. And that aspect where we can't prove theorems about them, they do things that we can't quite explain. Maybe it's what naturally allows them to be better suited for creative pursuits which we also can't explain very well.
Ilya Sutskever (34:24): Yeah, I think that's definitely possible as well.
Alexandr Wang (34:29): Yeah. One thing I'm really... Some of the other recent advancements from OpenAI were CLIP and DALL-E, both super interesting examples of being able to go between modalities from text to images. I would love to understand from you, what do you think is the kind of significance of sort of what is shown by CLIP and DALL-E? Where do you think that research goes over time and what excites you about it?
Ilya Sutskever (35:02): Yeah, so for context. So CLIP and DALL-E are neural networks that learn to associate text with images. So DALL-E associates text with images in the generative direction. And CLIP associates text with images but in the direction of perception from going from an image to text versus going from text to image. And both of them are cool because they are simple, it's the same old recipe, you just say, "Hey let's take a neural net that we understand really well and just train it on a big collection of texts and image pairs and see what happens." And what happens is something very good.
Ilya Sutskever (35:51): So the real motivation with CLIP and DALL-E was just to dip our toes into ways of combining the two modalities, because one of the things that people want in the future, I think it's fairly likely that we wouldn't want neural nets, sorry, we wouldn't want our future AIs to be text only Ais, like we could but seems like it's a missed opportunity, I feel like so much stuff is going on in the visual world. And if it's not difficult to have a neural net to really understand the visual world, and why not.
Ilya Sutskever (36:26): And then also, hopefully by connecting the textual world to the visual world they will understand text better, they will have a much... The understanding of text that they will learn by also being trained on images might become a little bit closer to ours, because you could make an argument that maybe there is a distinction between what people learn, and what our artificial neural networks learn, because people see and they walk around, they do all those different things. Whereas our neural networks, the text ones are only trained on text.
Ilya Sutskever (37:00): So maybe that means that something is missing, and maybe if you bring the training data to be more similar to that of people, that maybe we'll learn something more similar to that of people as well. So those were some of the motivations to study these models. And it was also fun to see that they worked quite well and especially now with I would say most recently CLIP has been enjoying quite some degree of popularity, and people have figured out how to invert it to generate high resolution images, and have a lot of fun with it. And actually I think that's the most for me emotionally satisfying application of the past maybe few months.
Alexandr Wang (37:42): Yeah, I think that... It's an interesting point that you mentioned, which is, hey, the more that we can... There's this concept of embodied AI which is hey if you have an AI that actually will go and experience things like humans do, maybe you get interesting behaviors, and the more we can go in that direction with stuff like multimodal learning is super interesting.
Alexandr Wang (38:06): Another thing I wanted to touch on is I think you mentioned something quite profound, which is, hey it's a very simple, the algorithm, the use of the algorithm is very simple. And in this case like producing the data sets and getting the data right is from my perspective what really enabled a lot of the incredible results. I don't know how you think about that, and how you think that defines future or similar areas of research?
Ilya Sutskever (38:33): Yeah, I'd say it's definitely a true statement that the field of deep learning, especially the academic branch, not so much the applied branch, but the academic branch of the field has underestimated the importance of data because of the mental framework of the data is given to you in the form of a benchmark and your goal is to create a better method that is better than the other existing methods.
Ilya Sutskever (39:03): And the reason it was important for that framework to have a fixed data set is so that you could compare which method is better. And I think that really did lead to a blind spot where very many researchers were working very hard on this pretty difficult area of can we improve the model more and more and more, while leaving with a very large improvements that are possible by simply saying, "Hey, let's get much more data on the table."
Ilya Sutskever (39:31): I think now at this point the people appreciate the importance of data a lot more. And I think that at this point it's quite proven that domains with a lot of data will experience a lot of progress.
Alexandr Wang (39:48): Do you think that more just conceptually, do you think that over the next few years more of the advancements, more of the cool things that we'll see in AI will come from sort of innovating more on the data side or innovating more on the algorithm side?
Ilya Sutskever (40:02): I wouldn't want to... I prefer to not make that distinction. I think making that distinction is... It's useful for some things, and maybe let me roll with that distinction. I think both will be important, I expect us to... I believe very firmly that very huge progress is possible from algorithmic, from methodological improvements. We are nowhere yet to being as efficient as we can be with our compute, we have a lot of compute, we know how to make use of it in some way, which is already a huge achievement compared to before.
Ilya Sutskever (40:42): Here's a historical analogy. You may remember that 10 years ago or so the only way to productively use huge amounts of compute was through these embarrassingly parallel computations, like MapReduce. That was literally the only idea anyone had, there weren't any interesting ways in which you could use huge amounts of compute. Now, with deep learning we have one such way. You say the compute needs to be a little bit more interconnected but it is possible to have a large amount of compute and do something useful with it.
Ilya Sutskever (41:15): But I don't think that we have figured out the best formula for making use of this compute, I believe that it will be better formulas and you'll be able to go much further with the same amount of compute. With that said, I am also very, very confident that a lot of progress will happen from data. I mean, I'm a big believer in data, and I think that there can be so many different things, you could find new sources of data, and you can filter it in all kinds of ways and maybe apply some machine learning to improve it. There can be lots of, I think there are lots of opportunities there. And I expect the combination of all of these to get when they come together, they feed off each other. And I think that will lead to the most progress.
Alexandr Wang (41:51): Yeah and one question, because to go back to this question of compute, you somewhat answered it, which is like, "Hey, we're going to have significantly more efficient algorithms." But if you kind of take this sort of the concept of scaling to the limit that we mentioned before, which is like hey, if you scale everything to the limit you'll get great performance. At some point, you're building supercomputers that are just far too... They're way too big or they're way too expensive or whatnot to be practically feasible. How do you think... Do you think as a field we get around that by getting way better, using our compute or do you think there is some like fundamental limit of compute that we need to kind of think about when we think about scaling laws?
Ilya Sutskever (42:30): So there probably does exist an ultimate way of using the compute, I don't think we found that way yet, I think we can improve the efficiency of our methods, the usefulness they derive from our compute, the extent to which they generalize, I think there are lots of opportunities that you haven't explored yet. I also agree with you that there will be physical limits and economic limits to the size of computers that one could build and I think that progress will consistent on pushing on all these axes.
Ilya Sutskever (43:06): Now, one other thing I want to mention is that there is a huge amount of incentive to find these better methods. Think about what happens if you can find a method that somehow allows you to train the same neural net with half the compute. It's huge, it's like you've doubled the size of your compute. So the amount of research there will only keep on increasing. And I believe that it will lead to success, it will take some time perhaps, but I'm sure we'll find really, really powerful ways of training our neural nets and setting them up in far more efficient ways, far more powerful ways than what we have right now. And then of course we want to give it all, to give those better ways all the compute it deserves, and all the data it deserves.
Alexandr Wang (43:50): Totally. Well one interesting concept just kind of related to this that I'm curious to hear your thoughts on. One thing that I think we've talked a little bit about before, one of the concepts of neural networks been embed in the name is that, hey you have basically have this very simple model of a neuron. And that simple model of a neuron is then allowing you to form these like brain like algorithms.
Alexandr Wang (44:16): And the reality is that neurons are actually very weird, there's like a lot of behaviors that we don't even fully understand mathematically, we only have like weak empiricals to understand. And so what do you think the likelihood that our current model of neurons, which are these simple, sort of inaudible 00:44:32 functions are the path to produce something that will resemble a brain or things that resemble neurons? Or do you think the chances that like, we're on kind of this interesting but slightly wrong path with how we're designing these networks.
Ilya Sutskever (44:51): So my view is that it is extremely unlikely that there is anything wrong with the current neurons. I think that they might not be the best neurons perhaps but even if we didn't change them, we'll be able to go as far as we need to go. Now, there is still an important caveat here, which is how do you know how many neurons you need to reach human level intelligence. While you can say maybe we can look at the size of the brain. But it may be that each biological neuron is like a small supercomputer, which is built out of a million artificial neurons.
Ilya Sutskever (45:26): So maybe you will need to have a million times more artificial neurons in your artificial neural net to be able to match the brain. That's a possibility. I don't think it will happen. I don't think it will be that bad. But I would say worst case that would be the meaning of it. In other words you will need a lot more artificial neurons for each biological neuron for it to be possible for them to simulate the biological neurons.
Alexandr Wang (45:50): Yeah I know. It's very interesting, it's one of these interesting questions around how much do we try to emulate biology and or are we implicitly emulating biology by having these small neurons that then create these super neurons that behave strangely?
Ilya Sutskever (46:07): Yeah, I wouldn't even say that we are trying to emulate biology, but we are trying to be appropriately inspired by it in the right way. Emulating biology precisely I think that would be challenging and unwise. But kind of using it as ballpark estimates, I think can be quite productive.
Alexandr Wang (46:26): One interesting thing that it's so funny, there's just like thing after thing after thing that OpenAI has worked on recently that's been interesting but the instruct series of models, I think was interesting in that it demonstrates a potentially an interesting paradigm for how humans and models will collaborate in the future. Why do you guys work on the instruct series? Why is it interesting and what excites you about it?
Ilya Sutskever (46:51): Yeah, so the instruct models are really important models, I should explain what they are, and what the thinking there was. So after we trained GPT-3, we started to experiment with it and try to understand what it can do. And we found that it can do a lot of different things and it has a real degree of language understanding. But it's very, very not human-like. It absolutely doesn't do what you ask it to, even if you can, even if it can. And so one of the problems that we've been thinking about a lot is alignment, which is if you have a very powerful AI system, how do you make it so that it will faithfully fulfill your intent, faithfully and correctly.
Ilya Sutskever (47:40): And the more powerful the AI system is the greater its generalization and reasoning ability and creativity, the more important the alignment of the system becomes. Now GPT-3 is a useful system, it's not profoundly smart, but it is already interesting and then we can ask the simpler question of how to align GPT-3, how to build a version of GPT-3 such that if you try to the best of its abilities as faithfully as possible to do what you ask it to do. And so that led to the creation of instruct models, and that's basically a version of GPT, where you just say, "Hey, do X, please do Y, I want to Z." And it will do it, and it's super convenient and people who use this model love it. And I think it's a great example where the more aligned model is also the more useful one.
Alexandr Wang (48:33): Yep. I want to... Thinking about GPT-3 and large language models, I'd be remiss to not talk about sort of some of the challenges that are associated them in particular GPT-3 is trained on, and the future GPTs are trained on just huge and huge amounts of data. And there's a lot of engineering involved in how do you engineer this data to work super well? What do you think are some of the challenges, especially as we try to figure out ways to use more and more data into these machine learning systems? How do you deal with the fact that in any sea of large data there's going to be weird biases or weird qualities that might be tough to sift through and manage?
Ilya Sutskever (49:16): Yeah, lots of facets to this question. So this is something that we've been thinking about at OpenAI for a long time. Even before training GPT-3 we've anticipated those issues will come up. And there are challenges, I can mention some of the strategies that we've pursued to address them. And some of the ideas that we have that we are working on to address those issues even further. So it is indeed the case that GPT-3 in particular, and models like it, they learn from the internet, and they learn the full range of data that's expressed on the internet.
Ilya Sutskever (50:00): Now, the model isn't... And it's okay. To a first approximation, we knew that this would be an issue. And one of the advantages of releasing the model through an API is that it makes it possible to deal with such challenges around misuse or around us noticing that the model is producing undesirable outputs. Now, of course there's a very difficult challenge of defining what that means. But suppose sidestepping that, you have some degree of ability to monitor what the model does, and correct it behind the API. That's number one.
Ilya Sutskever (50:40): Number two more positively, so here's the positive note, the model doesn't just learn the good, the bad, the model learns everything. And the model GPT-3 and like it, they are not actually attached to any particular one, let's say view that's expressed in the internet, a model like GPT-3, all these generative GPT models, they are like the ultimate shapeshifters, they can be whatever you want them to be. So the better they are, the easier they become to control, you can say, "Please answer the following question the way some famous person X would answer." You can pick whatever famous person you want. And the model will in fact do a faithful reproduction of how that person would answer.
Ilya Sutskever (51:24): So now I have a very fine grained degree of control on the models output, they can even step further. One work that should be done was to study the efficiency of fine tuning these models into what kind of behavior you'd like them to have. Now, of course none of these questions tell you which behavior is desirable. But what I'm talking about is, there exist ways to modify the model's behavior. And maybe there is an analogy here as well, the, the analogy is a little bit of a stretch, but I think it's useful. Say you want a child to grow into a functional adult, a good member of society, should the child only be exposed to good data? Or should ultimately you want to be exposed to all the data that exists, but also understand have an explicit understanding of which of it you want to exhibit and which of it you don't?
Ilya Sutskever (52:22): And I think this is the world that we'll converge to with respect to these language models, modular the rather not obvious question of precisely what it should be, how exactly do we want to behave? There's a question of how it should behave and there is a question of how to make it, the engineering question and on that one we're getting a handle.
Alexandr Wang (52:48): Yeah, I mean, to your point there's certainly... It is a fact of life that these large datasets are going to contain some sort of chaos, noise, things that we may not maybe from a moral perspective, we may not want put in front of a model. But I think what you're going towards is like, just from a pragmatic engineering perspective, it's going to be impossible to be precious about the data we put into the algorithm. And so let's acknowledge that, that's going to happen, and then be precious about defining the performance of the algorithm post training on it, and making sure that you're able to sort of mold the algorithm into the exact, to perform in a way that you would like.
Ilya Sutskever (53:30): So I think that is the more productive approach long term. However, I think it is possible to be precious about the data as well, because you can take the same models and filter the data or classify it and decide what data you want to train on. And I expect that as people train these models, we will... And as we train these models in fact, we are experimenting with these different approaches and find the most practical and efficient way of having the model be as reasonably behaved as possible.
Alexandr Wang (54:02): No, I think the results in fine tuning the algorithms are pretty exciting, because it just means that there's more degrees of freedom to be able to produce algorithms to behave in the ways that we want.
Ilya Sutskever (54:14): That's right and this is a property of models that are better. So this is a counterintuitive thing. The weaker the model is, the less good your language model is, the harder it is to control. Whereas the bigger it is, the better it is, the faster it is to find human. The more responsive it is to prompts which specify one kind of behavior versus another. So, in a sense I expect that at least this flavor of the problem that we just discussed will become easier as the models become more powerful and more accurate.
Alexandr Wang (54:50): Yeah. So we weaved through a bunch of very interesting topics. I want to take a chance to kind of zoom out. We started this talk by talking about how originally when you started working on neural networks, the optimistic version was, hey the field is going to pay attention to neural networks. And obviously now we believe more in something that resembles AGI is like the optimistic version of what the field can accomplish.
Alexandr Wang (55:18): I think that in the zoomed in approach, the past few years have just been this sort of incredible period of new breakthroughs, very new interesting things as a result of AI. When we kind of zoom out to a longer time horizon, what are the advancements of AI that you think are sort of, I shouldn't say on the horizon, but are just around the corner, and the ones that you think are going to have very meaningful implications for how the world will operate.
Ilya Sutskever (55:49): So I think that of the advances that are around the corner, I think that simple business as usual of the kind of mundane progress that we've seen over the past few years will continue. And I expect our language models, our vision models, our image generation called text to speech, speech to text, I expect all of them will improve across the board and the thing that will be impactful. And I would say with these generative models in particular, it is a little harder to reason about what kind of applications become possible once you have a better code model, or a better language model, because it's not just better at one thing, it develops qualitatively new capabilities, and it unlocks qualitatively new applications.
Ilya Sutskever (56:37): And I think it's going to be just a lot of them, I think that deep learning will continue to grow and to expand and I think that more and more, there'll be a lot more deep learning data centers. And I think we'll have lots of interesting neural networks trained on all kinds of tasks. I think medicine, biology, by the way, those I think those ones will be quite exciting.
Ilya Sutskever (56:59): I read that right now the field of biology is undergoing a revolution in terms of their ability to get data. I'm not an expert, but I think it's at least not false what I'm saying. So I think training neural networks there will be quite amazing, I think it will be interesting to see what kind of breakthroughs in medicine will lead to it, or I should mention AlphaFold I think is also an example there. So I think it's going to be, I think the progress is just going to be stunning.
Alexandr Wang (57:31): To kind of close, I mean we have an incredible AI community that's with us today and is probably very excited to figure out how they can ensure that AI sort of has a positive future, that we have a positive AI future. What do you think are the things that sort of everyone in the audience can take away from this conversation and work on that will help ensure that we have a positive future with AI.
Ilya Sutskever (57:57): So I think there are many things which are worth thinking about. I'd say the biggest one is probably to keep in mind that AI is a very powerful technology. And that it can have all kinds of applications and to work on applications that are exciting and that are solving real problems that are the kind of applications that improve people's lives, work on those as much as possible. And I also work on methods to try to address the problems that exists with the technology is to the extent they do. And that would mean some of the questions about bias and desirable outputs and possible other questions around alignment and questions that we haven't even discussed in this conversation. So I'd say those two things, work on useful obligations. And also, whenever possible, work on reducing the harms, the real harms, and work on alignment.
Alexandr Wang (58:55): Awesome. Well, thank you so much. But I would be remiss to not thank OpenAI and the organization for all the incredible contributions to the field of AI over the past many years. And thank you so much again for sitting down with us.
Ilya Sutskever (59:09): Thank you for the conversation. I really enjoyed it.