Building the Next Generation of NLP Applications With Richard Socher
Richard Socher is the founder and CEO of You.com, the search engine that puts you in control — Your sources. Your time. Your privacy. Richard previously served as the Chief Scientist and EVP at Salesforce. Before that, Richard was the CEO/CTO of AI startup MetaMind, acquired by Salesforce in 2016. Richard received his Ph.D. in computer science at Stanford, where he was recognized for his groundbreaking research in deep learning and NLP. He was awarded the Distinguished Application Paper Award at the International Conference on Machine Learning (ICML) 2011, the 2011 Yahoo! Key Scientific Challenges Award, a Microsoft Research Ph.D. Fellowship in 2012 and a 2013 "Magic Grant" from the Brown Institute for Media Innovation, and the 2014 GigaOM Structure Award. He also served as an adjunct professor in the computer science department at Stanford. Outside of work, Richard enjoys paramotor adventures, traveling, and photography.
Richard Socher is the fifth most-cited researcher in Natural Language Processing (NLP) and the CEO of you.com, an ad-free, privacy-preserving search engine. Prior, he was the Chief Scientist (EVP) at Salesforce. He joins Scale AI CEO Alexandr Wang in a fireside chat to discuss how leaders should apply research to production applications. Richard discusses the shrinking gap between pure research and the building of revenue-generating products, where enterprises can avoid the common pitfalls of AI adoption (regardless of industry sector), and the areas of research he is most excited about. He discusses how the changing shape of the internet that led him to create You.com, a new type of intelligent and privacy-preserving internet search engine.
Nika Carlson (00:22): Next up, we're excited to welcome Richard Socher. Richard Socher is the fifth most sited researcher in NLP and the CEO of you.com. Prior, he was the chief scientist EVP at Salesforce, where he led teams working on fundamental and applied research and across product AI platform. Before that, he was an adjunct professor at Stanford's computer science department and the founder and CEO, CTO of MetaMind, which was acquired by Salesforce. He has a PhD in the CS department at Stanford. Richard is joined by Alex Wang, CEO and founder at Scale AI. Alex, over to you.
Alexandr Wang (01:06): Thank you so much for sitting down with us today, Richard. Super excited to be chatting with you, and welcome to TransformX.
Richard Socher (01:13): Great to be here. Thanks for having me.
Alexandr Wang (01:16): Awesome. So you've worked on so many interesting things throughout your career. You've been a professor at Stanford. You've been the chief scientist at Salesforce, focused on both fundamental and applied research, and then now you're the founder and CEO of you.com, as well as in your free time an avid photographer. So, just taking a big step back, what inspires and drives you to do the work that you do?
Richard Socher (01:43): Yeah, great question. I personally really love AI. I think AI is a super fascinating field, because it captures everything from highly philosophical questions of what makes us human, which is to large [inaudible 00:01:59] intelligence, and how do we use it for doing something good and something impactful, all the way to very concrete use cases of helping people in medicine, helping people with access to good information, helping people with summarizing the world's information now that it's all accessible to everyone, and it's a little bit overwhelming. There's just so many cool applications that AI has that are very concrete, very in the now. And so AI has been the overarching theme, if you will, of a lot of my work, starting in academia, going to startups, working in a big company, and now back in a consumer startup.
Alexandr Wang (02:43): Awesome. Yeah, and one of the things MetaMind which you started and was later acquired by Salesforce was one of the very early AI startups. And so what were some of the challenges or the opportunities that you saw when you originally started MetaMind, and how has the whole field progressed since then?
Richard Socher (03:02): Yeah, great question. So the field has changed so much. When I started MetaMind, there's no TensorFlow yet. It was very hard for companies to actually use AI to be able to create their own algorithms, and actually have a production level system. And it's funny because now I realize every... We worked on like 10 different things. I help people create data sets, upload data sets very easily, make it fully automated, so you can compare lots of different models in the background, then show you comparisons of different models, then help you do error analysis. Then actually help you run it in production with three lines of Python code that was automatically load balance and all of that. And what's interesting is you now have companies that just tackle one of these aspects that are five to 10X more valuable, and in some cases, even more than that than what we were required for.
Richard Socher (04:08): So it's clear that the space has exploded like crazy. Every single one of the AI aspects is now its own company, and many cases, many different companies. And so the world has changed significantly, right? A lot of people want to work on the cool AI models, but when it comes down to it, there's just a lot of really hard work in getting AI into a production level system. And I'm sure you know this very well at scale, but there's just so much work in labeling the data, cleaning the data, [inaudible 00:04:40] it, then actually just standard engineering work and load balancing things and dealing with spikes and so on. And a lot of folks want to say they're working on AI, but they don't want to do most of these really hard aspects of creating the AI system, and in many cases, the answer is get more data, and then make your model larger.
Richard Socher (05:03): And both of those are hard process optimizations and then hard engineering problems, and they're not like, oh, let's add a new different layer type and fiddle around with objective functions or learning rate optimizations. That work is of course important too, but in terms of the overall amount of work that needs to be done in that aspect, it's very, very minor. And I think as these systems become more and more commodity, it'll become less and less important and probably the majority of the work is in everything around the model.
Alexandr Wang (05:38): Yeah, no, it is a super interesting perspective and I'm excited to actually dig in a little deeper there here in a second when we talk through exactly what you were saying about the scaling of deep learning. From your time at Salesforce, I think the... maybe your more unique things about the AI strategy that you helped lead at Salesforce was that it focused not only on applied machine learning and having helped build Einstein, which has enabled lots and lots of enterprises to be able to use AI effectively, but also you led a team of researchers that was continuing to publish papers, do basic research, continue very formative work. So how did you at the time think about the balance between research and applied ML? You alluded to it at the start where both are very interesting to you intrinsically. And then how did they play off of one another how they support each other, and how did you think about that?
Richard Socher (06:35): Yeah, it's a great question. I could probably talk about for hours. So let me know when I talk for too long, but it's... I loved it and it was a really amazing time because became an incubation engine too of new ideas, new algorithms and new products, and that was a lot of fun. And what's fascinating about AI nowadays is that the gap between pure, publishable academic research and actual products is actually getting smaller and smaller largely thanks to deep learning, where you have amazing libraries and you can hack up a quick prototype. You can play around on the research world with new ideas. But then because of a lot of the tooling around it, very quickly got it into production. And so that was a continuing process of how do we improve prototype to production pipelines?
Richard Socher (07:30): How do we iterate with customers very quickly, and eventually have the actual impact on the real product and generate revenue? And so I think that was the balance. And so sometimes I on some level also have to balance how much do we do fund research just for research sake, that is fund marketing. Why does a search engine company have to do 3D protein folding, right; AlphaFold. It's unclear. It's cool. It brings the world forward. And similarly Salesforce had ProGen, a model that generates protein sequences that are actually viable in the real world, which is amazing, and it very much brings forward a lot of our stakeholders and the world in general, but not everything had to be applicable right away in a product the next quarter.
Richard Socher (08:27): But you gain the freedom to do the pure research by having obvious impact with the rest of the research group where there are more applied problems, which also were interestingly right in that intersection. So, [inaudible 00:08:42], for instance, chat bots. Chat bots need a lot of natural language understanding. So super interesting research question, how do you do a chit chat, versus like goal-oriented dialogue systems, where chit chat being systems where you just ramble on, like how is the weather today? Oh, you have a pet. Oh, I hate my day. These kinds of talks versus like, what do you want to achieve? I want to reset my password. Okay. Well, here are the steps. How can I help you with each of these steps? Right. And you have very goal-oriented dialogue.
Richard Socher (09:14): Those are interesting problems both in the research domain still to be able to increase the scope and the abilities of these chatbot systems. But for some very simple use cases like reading my password, get my login back. How do I say my order never arrived, my delivery didn't get here on time, it was damaged. There are a bunch of simple things that you can already do in a product right now that would help companies massively save time and money. And so there's a beautiful flow in several of these cases between the [inaudible 00:09:45] research and the applied one. And I think that that is something that companies need to think about. I think the pure research is a feature that you unlock as a company. Once you have a clear path to exist in half a decade out, because you have revenue, you have growth, you have product market fit and so on. Once you're big enough, you can start thinking much longer term and then you enable yourself to have a research group.
Alexandr Wang (10:14): Yeah, no, super interesting. I actually want to, in the same vein, pivot to I think question that I think maybe a lot of us ask in the AI community, which is there's just been in general some incredible research, a lot of which you've had a part... played a pretty big part in such as ImageNet or across models [inaudible 00:10:33] learning and much more. But when we look broadly at the whole industry, there's so many companies and I would say almost most companies have yet to become early adopter of AI and machine learning despite this incredible progress from a research perspective of what we can accomplish today versus a decade ago. Well, why do you feel that's the case? And what do you think is holding back applied progress so much?
Richard Socher (10:59): Yeah, great question. The future is here. It's just not equally distributed as the famous saying goes. I think there... It's interesting and it's a great question, and I think the answer is it's complicated. I think there are some industries where AI has the potential to fundamentally change the entire industry, and there will be, at some point, a cutoff between the have nots and the have. And so an example for that is in biology, where if you can have an AI generate proteins in the future, you'll just have massive opportunities, and if you don't, you're going to be irrelevant fairly quickly. Likewise, I think in self-driving, if someone and when someone in the next decade or two actually gets a broader and broader self-driving capability, eventually the whole automotive industry will change.
Richard Socher (11:55): And people are going to wonder why they have to sit and concentrate for hours on a highway when other people can chill out and work on their laptop and next to them in another car. And so I think there are some industries where AI will fundamentally change that industry. Every industry is going to get changed by AI, but I think a lot of them are in that second bucket where no matter what industry you're in, you're probably going to have sales, service marketing capabilities, and you can basically make your entire business much more efficient in terms of growth and revenue generation, and also putting out inefficiencies. And so when you think about that, like a chatbot, if you have a thousand people that work in your service centers or your call centers, then you will benefit a lot if you have an AI that can start taking over these really mundane, repetitive questions of recovering your password and such.
Richard Socher (12:57): If you're in sales and you have an AI that can help you with automated outreach that is personalized, but it's a really great draft based on the person's LinkedIn, where they went to university, puts in a nice personalized draft, and then you just fix it up and click send, and you're like 5X faster as a sales person. And then you get a nice AI that tells you like, oh, this deal is most likely to converge based on all these other features, then you can make your sales process 5% more efficient. And then marketing, of course, there's tons of data that gets generated, and now basically every company can make their entirety of their stack five to 10% more efficient than each one.
Richard Socher (13:35): And then it's just overall much more competitive, compared to their competitors that aren't using AI in every different aspect of their business. And so those are the two different buckets. Are you in the bucket where fundamentally something's going to change, or are you just there to make your whole business more efficient with AI? And so that's hard. And it's hard, and both of these samples are hard for companies to be in both of these examples and spaces, because there's a lot of... just a lot of inertia in large organizations, right? If you say, we need to move to electric, there's another example, not as related to AI, but if you say, you need to move to electric, but you have an entire division that works on the carburetors, or that works on the transmission or the engine, none of those will be relevant.
Richard Socher (14:27): You're going to have people in the organization who are like, I don't want to move to electric because I don't want to be irrelevant. Right. And so it's just, it's complicated to make big companies change. And then once they realize they need to change and they're going to adapt it, then of course it becomes a massive data problem. So it changes from a vision and strategy question to a data problem where most of the time they don't have the processes to collect the data. And once they have the process to collect the data, then it depends on that process, whether it's the label data or unlabel data. And in some cases, there are specific solutions where the standard process already creates the data, the labels for you.
Richard Socher (15:10): So for instance, if you use a CRM, my old work at Salesforce, where you for instance want to automate how to classify an email. Email comes in, you're like, this is a sales email or a service email, or a marketing email. Which department or which person, which group should respond to this email? In those cases, you will have, if you have to do it after the fact, you have to label it all. But if you have a system where you already label it by your standard process of, well, this email comes in, instead of just forwarding it, you actually select it in your CRM to be like this is a sales email, it should be answered by this person.
Richard Socher (15:51): Then that data can be used right away to train your AI system. But most companies just don't have those setups yet because that thinking that every data about every aspect of their business is important to collect, important to maintain, and to have metadata about just hasn't really arrived yet with a lot of organizations.
Alexandr Wang (16:11): Yeah. Yeah, no, totally. I think that you're right, which is that the future is kind of... or the end state obviously is very exciting, but there's so many like, there's very practical hurdles around getting everything in place to be able to even set up the pipeline, set up the models in the first place and get into a continuous improvement loop, which is just a daunting task for most firms. One thing that you... a lot of research that... So the Salesforce AI group did was, as you mentioned around chat bots, around natural language, understanding natural language processing, and it's always been an area where there's been incredible breakthroughs in recent years with the large language models.
Alexandr Wang (16:59): One question I have for you and we alluded to this is that like, yeah, the massive trend in machine learning and deep learning has been towards doing bigger models and getting more data to fuel those algorithms. I'm curious, what are your thoughts on this mega trend in AI research, and how do you think about... contextualize that with a lot of the research that you were doing, which was maybe more fundamental focused on sort of like not just scaling the system up?
Richard Socher (17:29): That's a great question. I actually love language models. We've worked on language models for many years at Salesforce too and had contextual vectors, which led to ELMo, which led to BERT. And then I think language models are just a really clever task because it's so hard that you basically want infinite amounts of training data. And what's beautiful also is the training data is free in this case because it's just text, it's just the next word of any domain that you have any interest in. And then you're then in that regime where you want as much training data as possible. But once you're have that much training data, also starts to make sense to have massively large models with billions and billions of parameters. And so it creates all kinds of new, interesting, hard problems.
Richard Socher (18:21): And then once you are in that space, my hunch is the neural substrate, the actual algorithms will matter less and less over time. It's basically just some general function approximator that you can very nicely train and [inaudible 00:18:36] efficient manner. And whether it's massive [LSCN 00:18:41] or a massive transformer or a massive new model, after that, it almost doesn't matter. What becomes more and more important is how you train this model and how you make it get better over time. Right now, the world still struggles with true multi-task learning that was published in a paper called Natural Language Processing Decathlon or decaNLP, where you had a single model that does 10 different NLP tasks all to very high accuracy. The cool thing with language models is that they will very quickly do something, which is amazing.
Richard Socher (19:15): Like without any training, they'll produce something that works. But once you really care about that particular output, like let's say you're a translation company and you really care about that perfect translation, which we're still not fully there yet as an organization, as a society or research in general, you're going to want to fine tune that model to be really good for that task. And if you really care about answering emails, then it will be a different fine tuning thing system. And if you then care about [question 00:19:48] answering over general web data, it will get, again, be a different model. And so no one has yet cracked this really hard problem of can those final really good models be the same, because they're always like, we started from just word vector sharing, then with [code 00:20:07], we started to have contextual vector and have more of the [end-coder 00:20:10] being shared.
Richard Socher (20:09): And the end-coder's really great. Now, with decaNLP, we had the decoder be shared too. And I've seen language models as also part of that system where we basically share more and more of the decoder as well. And now the question is okay, but can you really not fine tune it and have a separate model at the end of that training process every time because if we crack that nut, as a research community, then we can start all basically doing [inaudible 00:20:42] or cumulative research on a single model. And that I think is one way to get to really truly intelligent AI systems. So yeah, so those are some of my thoughts on these large language models. I love the direction. I think it's great. I think we can go even further with it.
Alexandr Wang (21:00): Yeah, no, well, I think you bring up an interesting point, which is like, hey, the part of the magic of the large language models is that they can do something in a lot of scenarios, which is pretty surprising. There weren't algorithm in the past that could do something in lots and lots of scenarios, but I think we bring up a good point, which is like right now, the paradigm for getting it to be really good at some of these tasks is to you fine tune the model. You fork off a little model in some sense. Well, what do you think are the exciting directions to actually get to true multitask learning where it continuously is improving? Obviously a lot of the research community is focused on to [really 00:21:38] scale them up. Maybe as we scale them up, maybe we just continue getting very new, interesting behaviors that we didn't expect before. But what do you think about like how do we actually get to that eventual state over time, of multitask, of like effective multitask learning on one big network?
Richard Socher (21:55): Yeah. To be honest, no one has the full solution. We proposed a couple in the decaNLP paper, but no one has been able to beat it in year and a half, two years now. And it's really hard because so much of everything we do in academia as well as an industry is focused on take a specific task, specific data set, a specific metric, an objective function and just go hard with that. Right. And all our thinking, our tools, they're all like fixed objective function and just optimize your metric on one particular test set. And so I think it'll require a lot of changes across the stack of research in AI to really push these... Stanford now calls them foundational models; to be true foundation that you can build on top of.
Richard Socher (22:50): And just to understand, imagine every time someone wanted to build a new operating system, they'd have to start from scratch, instead of just taking what's out there and making it a little bit better like with Linux. It would take years and we wouldn't be nowhere near as far if open source software couldn't fork and then start and improve it from that last state, but instead had to go way far down to some previous state and then fork from there. So I think there's a lot of power that we can have if we had a single really strong model. But yeah, to be honest, if I knew the answer, I would've published it. I think there'll be more research in what I call increasingly complex objective functions, where you start with simple objective functions, like predict the next word.
Richard Socher (23:40): And then as you train it, you learn these new skills and add new tasks to it. And then there's something so-called catastrophic forgetting or catastrophic interference, which we actually learn in our research. It's not quite as catastrophic because the top layers of the output start to be worse. If you say you a large language model on translation, then you train it on question answering afterwards, we'll start to forget translation, but actually it quickly comes back. There's a beautiful human analogy here in that if you know how to ride a bike really well, and you don't ride for five years, you might not be able to do a crazy bunny hop right away again, but you're still going to be able to ride quickly, if you just do a little bit of extra training.
Richard Socher (24:27): And so neural networks are similar in that they actually don't catastrophically fully forget something when they're larger or deeper, but they forget the last couple of layers. And so we need to think about training regimes, where you keep bringing back old information to try to not forget it and have newer ways of optimization. And my hunch is the research community will start focusing more on ways like how to think about objective functions that get increasingly more complex, and also think about ways to optimize better over time.
Alexandr Wang (25:04): Yeah. And I think that like they're part of this like these general trends, right? Where like for example initially we were... a lot of the community was focused on not only fixed tasks, but also fixed data set. Now I think we're, as a community, realizing, especially the large language models and other kinds of work that like, hey, if you relax the fixed data set constraints, that actually helps you do new, interesting things. And then over time, if we figure out ways to relax, [inaudible 00:25:31] task constraint. To what you're saying, like with the decaNLP work and getting more complex objective functions [crosstalk 00:25:39].
Alexandr Wang (25:40): Yeah, and then the other part that I think is, that I want to double down on what you just said as well is that like, yeah, I think one of the things that has really held back machine learning progress from being like as fast as say we've seen in the software world is this lack of ability to cleanly build on top of past work because the... unlike like an operating system, which is very logically designed, these large... a lot of work in machine learning is relatively hard to explain and don't exist necessarily [inaudible 00:26:15] abstraction layers between one piece of work and another piece of work. And so I think that it's actually quite insightful what you're saying, which is if we get really good at multitask learning, then perhaps that creates the conditions under which we can all start building upon one stable foundation, or one stable core.
Alexandr Wang (26:34): Cool. Well, so excited about you.com and really excited to be able to use the product. To switch gears back or to just take a big step back, if you look at kind of... Obviously in you as one manifestation. So looking at what are the current advancements in AI and how are those going to impact the ways in which you use technology in the future. But taking a big step back just in a broader view, if you look at the current frontier of AI research and what's being accomplished, what do you see around the corner from in industry perspective in terms of what do you think that business leaders need to be thinking about right now in terms of how the current advancements in AI research are going to start affecting their industries five to 10 years down the line?
Richard Socher (27:25): Yeah. So I think that goes to some of our earlier conversation that you first have to decide whether which bucket you're in. Is AI going to fundamentally change your entire industry like with protein engineering or self-driving cars, and there are probably a few others in that space, or is it just going to make every aspect of your industry more efficient? And then depending on which one you're in, you have to decide as a company, what is your core competency, right? If you're an insurance company, clearly you need to have some top AI people on identifying risk, and being able to really classify risk to a very high amount of accuracy, because the core area of your business. But having a service chatbot, you might be able to rely on an outside partner for having your marketing automation you might have a different partner for.
Richard Socher (28:17): And of course, with my Salesforce hat on, that's probably... It should be Salesforce if you're a reasonably large company. And so, yeah, so that's the main decision. Is it your core competency? Is it going to change your entire industry fundamentally? Or is it something you can rely on partners for? And my hunch is like in some cases might be even a mix, right? An insurance company will have to have some really solid risk assessment AI, but also can rely on some external things [inaudible 00:28:47].
Alexandr Wang (28:48): Yeah. And actually this is more from a... just from a [intellectual 00:28:54] curiosity perspective, but obviously one of the areas that language models have been applied to is code. And again, it's sort of, it's interesting that it even works at all in being able to generate code. What do you think of is the future of software engineering? We talked a bit about how some of these other industries get totally upended. There's obviously concerns or potential that software engineering has changed forever. What do you think?
Richard Socher (29:20): Yeah. I think we already see that software development in practice has changed significantly over the years, and that people less and less have to write new kinds of hard algorithmic code. And it's more and more about, can you quickly use all the libraries out there? Are you aware of what you have to actually innovate and where you can rely on existing software packages? Right. And then we'll see the increasing usage of SaaS and B2B offerings where you have programs or companies that help you with testing, that help you with automated scaling of some models, that help you with a login and analytics.
Richard Socher (30:05): And there's so many different companies out there, and knowing when to... you can rely on some external service, when you can rely on some existing open source software package, and when you really have to actually innovate and build something yourself. That I think is super crucial, and I think things like codex will amplify that. There will be small chunks where you don't even have to know anymore. It's like, okay, how did I sort by dates in Python? Just like, just ask that to AI. And you will see soon that your search engine of the future will incorporate similar ideas of just giving you good answers and letting you go back into coding.
Richard Socher (30:44): Actually, a big focus for us at first is to be a great search engine for developers in particular. So, you'll be very happy to see some of the examples very related to your question just now when you.com comes out. But I think we'll see that continuously... There'll fewer and fewer people that are needed on the very low level, hard algorithmic questions, and more and more people just who are able to put together existing packages and create something that has no bugs, and that still, despite the levels of abstraction, can function at a very high [inaudible 00:31:23] kind of basis.
Alexandr Wang (31:26): Yeah. But again, and as you mentioned, [inaudible 00:31:29], in some sense, that's just a continuation of a trend, right? As time has gone on, when Google is getting started, it took probably a 100X as many engineers as if you were to build something similar today because the trend of software engineering is this great ability to be able to build on top of new abstractions. And this is potentially the next generation or next iteration of that mega trend.
Richard Socher (31:56): That's right. Yeah. And you see that in basically everything in the stack. Nowadays it's so easy to build a new web app that scales reasonably well in like nothing, right? You can create your own website, something like Squarespace. You don't even need to code anymore. Right. And you'll see that, I think, with AI too. I think AI, a lot of people have tried to build something like Squarespace and certainly we at MetaMind did too, but AI still is not quite there yet. There's still so many different use cases out there, and it's hard to build something that's just drag and drop. You click, click, click, and boom, you have some unique, special AI algorithm. But my hunch is we're going to get there in the future too, where you just say a few clicks and eventually all these leaky abstractions will get less leaky and you'll be able to understand better, and basically have automated away more and more of that complexity.
Richard Socher (32:47): And we see that already now with a lot of AI tooling companies like rates and biases, scale, companies like Hugging Face, where I'm also an investor and they've been doing phenomenally well, that basically are able to abstract away a lot of complex natural language processing models and immediately just provide you a out-of-the box NLP system that would've taken previous teams years to implement. Right. And so we see that ML tooling space already. And this is actually one area that I love about AI. It's a very, as a community, it's very forward looking community that always tries to new experiments on how to publish papers.
Richard Socher (33:31): We're not doing these super old-school journals anymore. Then we might move to [inaudible 00:33:35] conferences, and now it's just an archive right away. And sometimes people forget in that community very quickly too. Like you basically see papers that only cite things that happened last year or two, and everything else is ancient and already forgotten. But overall I think it's a beautiful way to just speed up innovation across the whole community.
Alexandr Wang (33:58): Yeah, no, and in that vein of how quickly the whole community is moving, what are some of your top predictions on what the future of AI research or the future of AI even looks like over the course of the next five years? What do you think... We talked a little bit about multi-task learning and more complex objective functions. But what are some of the other things that excite you or that you expect to happen?
Richard Socher (34:25): Yeah. So I think there are three aspects. So one is I think AI will just be a more common general tool that more people will get to use. And because of that, more tooling will exist for it, more automation of different aspects of it. Both the running experiments, collecting data, labeling data, helping you understand problems in the data, biases that you may have in your data sets, and issues you have dealing with distribution shifts where over time new things happen. You release new product, your chat bot doesn't yet know how to respond to those kinds of questions, and having continuous integration tests because as you automate more and more harder and harder intellectual tasks that are still somewhat repetitive, you will need to have humans in the loop, but it'll be fewer humans that do a harder and more interesting kinds of things.
Richard Socher (35:28): So in service, for instance, there will always be some complex system like, oh, my router with this firmware interacts with the computer and that firmware, and this code base, and now we're getting this error message. It's going to be hard for an AI to be able to answer all of those things and knowing the complex interactions and figuring out new bugs, right? The whole point of training data is you've seen something before, extrapolate from it to some degree, but you can repeat an existing process. If no one has seen that bug before, it's going to be hard for an AI to be able to know how to respond to it for quite some time. Really extrapolate and connect logic and fuzzy reasoning. That's kind of one of the... So yeah, so that's the first two things. We'll see more tooling; ML tooling.
Richard Socher (36:13): We see more use cases and automation around that. Then we'll see more [AI plus X 00:36:23], which is also my like funds thesis for investing is like AI and AI X [inaudible 00:36:30] invest in like invest in a lot of specific verticals too. So there's ML tooling, and then there are verticals on top of it. So we'll see many more applications in healthcare, in automotive, in B2B, enterprise software, but also in lots of that. All kinds of different spaces like battery optimization and sales optimization for company. It's like everything, every industry out there will be impacted by AI. And then the third point is the super long term. I think people will continue to strive towards AGI, an interesting artificial general intelligence problems.
Richard Socher (37:17): And I think there are three major road blocks that we have there, and I don't actually think enough people are working on them to... So we'll have to see. Hopefully we can motivate some people here. But one is this idea of increasingly complex objective functions. Babies are usually pre-born, wanting to [inaudible 00:37:34], wanting to have attention, wanting to move around, exploring things. And then at some point they want to talk more and understand language more, and then they are able to do that. And at some point, they can do all kinds of crazy back flips, and then they want an iPad and you're like, whoa, how did that happen? Right. And so, and that's the idea that you start with a predetermined sequence of relatively fixed objectives. Like most babies by month five, they have certain sets of skills, and by month seven, they have different, new sets of skills.
Richard Socher (38:04): And there's object permanence, that it's like something that really young babies don't yet have. And that's why they're like, wow, [inaudible 00:38:10] finger disappears or something, and intuitive physics. There are all kinds of interesting things in child psychology. And it seems like maybe we need something that for an AI where this, we have a sequence of predetermined and increasingly complex objectives. And at some point we need to, as a community, think about what it would look like for an AI to set its own objective function. And no one has worked on that. And so, but I think it's a crucial bit. It's hard to have intelligence, if all you're doing is optimizing what someone else told you to do your entire existence, and never having a thought or a goal that you set yourself. So, that's one aspect.
Richard Socher (38:53): Another one is hard multi-task learning. We talked about it before. Very few people are working on this hard thing. It's just really hard to say, oh, we do these 10 tasks almost as well with just as well as others. It's much easier to say, oh, we do this one task better than everyone else, and we ignore everything else. That's just how the community is currently set up, so it's hard to publish in multi-task learning, but we need to solve it. And then the third aspect towards AGI is in trying to combine this fuzzy probabilistic reasoning with truly logical reasoning, where you can reason over sets of numbers. What's kind of funny is you're always excited that language models can say, oh, what's 10 plus five. And it's like, wow, it says 15. And it's probably seen that somewhere in the training data, but you still can't like... and now every number to number 10, instead of being one integer is like this is like thousand dimensional floating point [inaudible 00:39:47], and it's able to do that.
Richard Socher (39:50): But if you ask it what's 365.6 divided by 59.8, it couldn't it, and even a simple calculator can, but the system which has so much computational power and so many parameters still can't do it, because it's never seen anything like that particular division before. And so there's this idea like how can you get the good stuff from the large language models and extrapolate, and they're getting better and better at the fuzzy reasoning that we do it [inaudible 00:40:17] more probabilistic statistical and so on, to then actually adjust it logically, and have that logic be somehow also learned, but then be able to extrapolate to arbitrary settings and things that are way outside of anything the model had seen, but are logically combined. Those are three areas that I'd love to see in more research settings towards AGI.
Alexandr Wang (40:43): Well, that's a great call to action for this community, thinking about three key topics for the path to AGI, which is one, better multi-task learning, as we talked about. Two, better objective functions or more research on objective functions, and ultimately getting to the point where AI or machine learning generates its own objective functions. And the last one around better logical reason or better ability to do some of this fuzzy logic and logical capability. So with that, it's a great note to end on. Thank you so much for your time, Richard. And this was a super interesting conversation, a lot of really interesting technical and sociological thoughts here. So thank you so much.
Richard Socher (41:22): Thank you. Thank you, Alex. Thanks for having me. Great questions and have a great rest of your conference.