+00:00 GMT
  • Home
  • Events
  • Learn
  • Help
Sign In
Growing With Open Source: From Torch to PyTorch With Soumith Chintala
Posted Oct 06 | Views 406
# TransformX 2021
# Keynote
Soumith Chintala
Soumith Chintala
Soumith Chintala
AI Researcher @ Facebook AI Research

Soumith Chintala is a Researcher at Facebook AI Research, where he works on high-performance deep learning. Soumith created PyTorch, a deep learning framework that has traction among researchers. Prior to joining Facebook in August 2014, he worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices. He holds a Masters in CS from NYU, and spent time in Yann LeCun’s NYU lab building deep learning models for robotics, pedestrian detection, natural image OCR, depth-images among others.

+ Read More

Soumith Chintala is a Researcher at Facebook AI Research, where he works on high-performance deep learning. Soumith created PyTorch, a deep learning framework that has traction among researchers. Prior to joining Facebook in August 2014, he worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices. He holds a Masters in CS from NYU, and spent time in Yann LeCun’s NYU lab building deep learning models for robotics, pedestrian detection, natural image OCR, depth-images among others.

+ Read More

Soumith Chintala Creator of PyTorch, AI Researcher at Facebook AI Research (FAIR) discusses how open-source best practices can be used to run a successful development project with a collaborative community. He shares how these practices helped make PyTorch one of the most popular and widely used machine learning frameworks today. Join this session to learn how you can use tools, processes, and communities to run a highly effective open source project.

+ Read More

Speaker 1: Thank you, Alex. And thank you, Sam. Our next presenter is Soumith Chintala. Soumith is a researcher Facebook AI, where he focuses on high performance deep learning. He's also the CO creator of PyTorch, a deep learning framework does gain significant traction, first among researchers, and now broadly to become one of the most popular machine learning frameworks available today. Prior to joining Facebook, he developed deep learning models for music and vision for mobile devices. Soumith joins us today to discuss the last five years of ML frameworks and the future bets. Soumith, the stage is yours.

Soumith Chintala: Hi, I'm Soumith. I work on machine learning frameworks. I started the Priority Project at Facebook among other things, and I'm going to talk today about machine learning frameworks, how they've evolved within certain dimensions of interest. And within this framework that you can think about, I'm going to talk about how they will continue evolving going forward. And I talk about the future as a distribution of things. And this talk is 30 minutes and the field, you can talk about it for days. So obviously, the talk is going to be simplified in various ways. So please bear with me on that note. But I still think just talking about a few dimensions here and talking through machine learning frameworks and their evolution within these dimensions is going to be pretty useful. Okay, so let's start. I'm going to introduce three people, three personas.

Soumith Chintala: The first one is MODELER, a modeler is someone whose job it is to look at the data and they assess the quality of the data. They ask, "Hey, do I need more labels?" And then they start doing pre processing or feature engineering. And then they pick some way to do machine learning, they build an architecture. And then they encode inner priors into the learning either via some trick of the architecture or some regularization scheme. And then they build a training pipeline. They do machine learning to solve some tasks, either of research interests or business interests. And then there is the second person I call PROD. And prod is typically the person who modeler goes to when they actually want to reliably ship something into some critical part of some tasks, so reliably ship it to what we generally call production. So a prod usually tries to make sure you're able to version your models so that in case something feels wrong they can roll back and that you're able to version the data that comes in and goes into the models when they're trained.

Soumith Chintala: And they also generally make sure that all of the metrics that they monitor are within acceptable ranges, and they make sure that new models the MODELER has given them are with an acceptable ranges of performance to keep costs or power down. And they make sure to do that in coordination with the third person I call compiler. Now what does compiler do? Compilers job is to map models that the modeler has given either for while they're still training the models or when they enter production to map those models as efficiently as possible on to hardware. That could be server hardware, that could be accelerators, that could be phones, that could be some embedded systems, that could be the Mars Rover, anything. So their job is to squeeze the best performance out of the models, either maybe see performance per watt or performance per second or performance per dollar. That's pretty much it. Even though the term is compiler, they can even be a hardware implementer, they just build new hardware somewhere like in video.

Soumith Chintala: So let's talk about how the software stack. Don't forget the personas. But I'm going to just quickly talk a little bit about how the software stack has evolved over time. And that's kind of important. And then we will actually tie this to the personas. So before deep learning are popular, before 2012, you typically had a software stack that somewhat looked like this, where a lot of focus was on pre processing, feature engineering, post processing. And so you had domain specific libraries for that. And for the machine learning models themselves they had a very small way to interact with software packages or libraries that built those machine learning models and trained those machine learning models for you.

Soumith Chintala: So if you ever use XGBoost, or Scikit-learn, or Volvo vibe it you give some kind of configuration of what model you're building, what learning rate or regularizer, or how many trees are in the forest and so on. And once you build that config, you give that to a factory and then along with that you give your data that is in some pre processed or clean form. And then the engine, the software engine that implemented a particular machine learning algorithm just handles the entire stack of the training loop and all the implementation details of the model. And then pre 2012, they mostly map to CPUs. So something like XGBoost would specialize a lot for gradient boosted trees to always have the best performance in register of CPUs and do all kinds of tricks and things that are very specialized to boosted trees that make it go faster. And then, the one thing to recognize here is the model in this context is typically is a configuration that is generally small and usually readable by humans, and then a blob of weights that are stored on some blob format, maybe on disk or in memory.

Soumith Chintala: So, enter deep learning. Late 2012 deep learning got popular, deep learning is nothing but neural networks, or differentiable learning. And to get popular and hence came the frameworks that enable modelers and compilers and prod to practice deep learning. So in the post deep learning world this is how the stack looks like. So the stack looks like you have a very large API surface in the middle. So mainstream learning frameworks like PyTorch or TensorFlow have thousands of functions in their API. And these thousands of functions are string together by modelers to build models and they can look in all forms of shape and size. And below that you have data structures, typically tensors, say dense tensors or sparse tensors or within dense tensors, you can have layouts of memory that might make computation more or less efficient. And then you have a bunch of hand optimized functions that are typically written by high performance computing experts that map these APIs efficiently on to accelerator hardware.

Soumith Chintala: You also in the last few years have been seeing compilers pop up. So XLA or TorchScript, or TVM or examples of compilers that take whole models described in the APIs of these frameworks. And they map them more efficiently to hardware than stringing through their together hand optimized functions. And lastly, you typically have a distributed transport layer that enables these models to run on multiple devices at once or multiple machines at once. And on top of this API, you have domain specific libraries that make it easy to train your models within particular domains. Like for example, you might have computer vision specific pre processing or functionality that all computer vision people can use together. NLP audio, they generally come in all flavors and sizes. But you also have high level frameworks such as fast.ai or Kara's or Piper's lightning who try to bring that pre deep learning convenience of quickly describing what you want to do or quickly fitting your data to your model instead of verbosely implementing everything manually.

Soumith Chintala: And then on top, you have prod tooling, such as TFX or Torchso or SageMaker or the Spark AI starting to have some tooling. So, the general mainstream deep learning frameworks do a full vertical integration across the stack to make things pretty efficient. There are particular solutions by various parties that only focus on particular parts of the stack and they interface cleanly with the rest of the stack. So one thing to recognize here is in this post deep learning mainstream machine learning frameworks PyTorch and TensorFlow model is described as code, so typically code in some language, that is, basically it's not a configuration file or a JSON blob anymore. It's actually complicated code which can have loops and various structures that you typically define associated with the programming language. And then weights, they are just blobs of numbers that are stored somewhere.

Soumith Chintala: So it wasn't always like this, that picture didn't always look like this. So just after deep learning got popular, you've had various frameworks could ICANN net, which is the framework that started the revolution and then Cafe One and then I used to use this framework called AB Learn. And they had a much smaller API surface. And they had lesser data structures, they had only hand optimized functions, they didn't have compilers, they typically didn't have distributed support, they didn't have much going on. And they didn't have an ecosystem of domain specific libraries or utilities on top of them. And in regime model was still described as a config, like as a protobuf or a JSON, or like customly defined configuration files and weights. So it was still basically transitioning from that pre deep learning world and that was what was most convenient. But you actually had counter examples of those. So Tiana, which was actually much very ahead of its time, had model being described as symbolic graph. And the large API surface basically makes writing the framework really hard. So Tiana had a compiler, the compiler was really slow or wasn't very efficient. And that largely made things very difficult.

Soumith Chintala: And eventually, things evolved where there were, I think, tens of frameworks and they all evolved to only two surviving as mainstream frameworks. And those two are Hydrogen and TensorFlow and they both do model, being, code, and weights. And one thing to ask ourselves is, why did we enter this model equals code regime? Why didn't we just stay with config files and so? And one of the reasons is basically, modelers were pushing the limits of frameworks. And they were implementing ideas that look more and more like real programs, they had dynamic control flow, dynamic shapes, basically, the shape of the input tensors changing from one iteration to the other, typically seen an object detection or NLP or if you looked at say, again, training, it was very different from say standard image classification or any kind of classification where typically you just did forward, backward, and then update and then went to the next iteration forward, backward, update, again, training, change that loop, which means some internal details of these ML frameworks were no longer compatible with what modelers wanted and semi supervised learning takes that even more extreme schemes like BYOL, or SimCLR, which became recently popular.

Soumith Chintala: They have a very complex training regime. And the training loop itself is very involved. So again, the whole field roll towards the convenience of the modeler, convenience of expressing ideas of modelers. And it did come at a big cost, both compiler and prod were generally unhappy because their lives got worse like it became harder for them to write a compiler or math efficiently to accelerate our hardware if you're talking about more general programs. And same as prod like if model is a config plus rates, prod could use the version models and such, but that wasn't the case anymore with model becoming code then prod had to figure out how to debug models in production and all kinds of nasty issues and prod wasn't happy with this regime either and isn't. So you can ask, "Oh, there's three people, somehow model was code stack." And then the second question you could ask is, "Why do we have such a large API surface that's not where we started? Cafe or put a con that had typically a very small API surface. And again, it has to do with the fact that every few months, people publish some disruptive new results that involve some new building block or some new training regime that has to be expressed in different terms than previous mid level building blocks.

Soumith Chintala: So for the large part of we had these ML frameworks who roll towards very low level or mid level building blocks, and a lot of them to express all the mathematical functions and ideas that modelers had. It again, was because of the convenience of the modeler and it came at a cost that compilers and prod where we're even more unhappy. So why did modeler get so much leverage? Like, if there's three people in this ecosystem, why is modeler getting so much importance? Why do they have so much leverage and that's a fairly important question to ask. The reason is because modeler is credited largely with making progress in the field. So AI after 2012 slowly increase in the hype to a point where everyone wants AI to do everything in the world. And modelers have been credited with trying to keep up with going towards that hyped up world and making progress. And so they've been the ones who are creating all the new value. I mean, the AI ML compiler software, whatever stack has been evolving to taking care of modelers and that has been almost existential for compiler and prod to survive.

Soumith Chintala: For the large part, there seems to still be progress using whatever modelers do, so that's the way that field is going. Compiler isn't happy, right? Compiler is looking at themselves and they're like, "These three are disruption cycles were some fundamentally important architecture that I thought it will be important for the next 30 years is no longer used." I mean, you can look at say LSDM, or AlexNet, or VGG, or Inception and all these very popular architectures of their time that people were almost universally using only three to five years later no one uses them. I mean, LSDMs are old but they start getting popular sometime in 2014. Again, because of work out of Google and that's what I'm referring to. Like they got popular and then once Transformers came out no one's using LSDMs anymore. So if someone somewhere and I know of a few people tried to build specialized hardware or compilers or implementations of software that are handwritten, they're very specialized to say LSDM and ResNet-50. And that's pretty much all it does. But it does that 100 times better, or some promise like that.

Soumith Chintala: But then to develop that software of hardware, they would take three years. Well, by the time they actually ship, these things are no longer used. And that's a problem. So compiler is generally not happy that the only stable primitives that they have been able to work with are convolution and general matrix multiply. That's also why GPUs are still extremely dominant and haven't really given up their market share to a more specialized hardware yet. So what does compiler actually want? What do they want that it's better? They want something that looks like an A1, they want a stable high level IR that is small and closed within itself. And they just want for it to not change. So they can build some specialized, high performance expertise to map that more efficiently to new hardware or just build new hardware that executes this high level set of programs more efficiently. But modelers keep expanding the operator set and the keep breaking all kinds of fundamental abstractions and keep going lower and lower down the stack. And they keep giving trouble to the compiler.

Soumith Chintala: And the other persona that's not happy is prod. Prod want easily version for DevOps like models, they want to be able to roll back, like they want to do very simple things so that they can keep, if something goes wrong, there's very few variables that actually change. They don't want you to pull some random Python function from some random Python package from the internet and then use that within your model because then that model has to ship to productions of prod. Between you're writing the model and then them shipping it, they have to figure out how to strip the model off that Python function, or figure out that it's actually safe and shippable. Or you have some constraints with production, right? You might say, let's see how to ship it to Android or something. Then it's a lot of work to ship Python into some app on Android. So prod isn't generally happy with doing crazy things and modeler just does crazy things. And so as I mentioned modelers leverage is that every three years they seem to have very big disruptions. And every few months, they seem to have incremental disruptions. And the pace of value creation has been slowing down, they're still seems to be a gas in the tank.

Soumith Chintala: So one of the reasons I would say PyTorch was successful is because it put modelers at the center of the universe. I used to give talks in the early days where I said, "Hey, I don't know if PyTorch is the fastest framework in the row, it might even be 10% slower, but it will give you more flexibility and debug-ability and help you express your ideas better." And what that did is it made modelers lives easier. And the compiler and prod people back then were like, "Yeah, but we will never ship this into production." And then what ended up happening was because modelers created future value and that future value depended on all this flexibility, compiler and prod actually had to come around to come to terms with the new reality. So let's talk about the future. Modelers leverage, when does it end? Will it end or will it maybe still increase? Is there still like gas in the tank for modelers to keep innovating and keep getting credit for progress in AI?

Soumith Chintala: And so compilers and prod will continue to under fit to the problem and be under leverage in doing a better efficiency job if they lived in a different more stable world. Whenever we talk about future, I typically think of it as a distribution over chains of events. You say, "Well, this thing can happen with the probability x. And then if that happens this next thing can happen." And then you just chain them. So I'll talk about a few events that could happen and how the ML frameworks stack would change. So let's see the effects of a few possible events. The first event is, let's say, today, Transformers and con nets make up for the majority of what people think are the answer to everything. Let's just hypothetically say that actually becomes true. And that they just become the stable dominant architectures where dominance, like they take all the heavy parts of the distribution of architectures that people use, then what would happen to this diagram from before? Well, the API surface of the four frameworks that are needed to be mainstream will actually reduce, the data structures then we'll shrink rated, we wouldn't need so many, like tensors with five layouts and all that. And then pretty much everything under the stack will just have a much, much easier time.

Soumith Chintala: So the number of hand optimized functions will shrink, the compiler will have an easier job, the hardware people can start specializing more to say the shapes and sizes of the types of convolutions or matrix multiplies or whole transformer blocks that they need to compute. I don't know if that will happen, but if it does happen, that there will be a next wave of frameworks, which will again look like the classical frameworks where they will just drive everything with config files and then specialize, you don't have to expose a much more generalized scientific computing framework to the general public. Hugging Face was already doing this, become more dominant temporarily, there will be other players that come in that try to take charge of this insight. Let's talk about a second event. Let's say there is some hardware that looks very different from all the existing simulators. And then there's some obscure, not obscure, but some not as used machine learning models, such as probabilistic graphical models or even some popular ones such as sparse networks that have not been mapped efficiently enough to the current accelerators.

Soumith Chintala: Let's say they were mapped onto some hardware that looks very different, like Cerberus. And there's some disruptive results that are shown, then pretty much the entire stack of machine learning has to be rethought from scratch. And it would be a very, very, very disruptive event. And new frameworks that actually enable that work will take the mantle. And it can actually end up being a transformative event that gives an opportunity for new languages like Julia to start taking charge of the field. Right now, no one wants to move from Python because they don't have enough incentive to. So it could create an incentive as such that can make such a change. And that would be interesting and exciting. And I would definitely look forward to something like that. The third event I want to discuss is let's say, you had a particular regime where models were actually first one together from a bunch of pre trained priors or weights. And then hence, models became much more data efficient. They didn't need as much labeled data. I think, typically, it depends on the priors and how they're expressed. But let's say the priors are neural networks, then PyTorch and TensorFlow will probably continue their status quo. But then there will be whole websites that are about selling priors and which are about discovering priors, websites that are going to democratize prior discovery and usage.

Soumith Chintala: And there will probably be new sets of people trained to know which priors are better than the other. And there might even be neural network architectures that predict which priors to string together for which problem. And if they are not, if priors and not just neural networks, but there could be neural networks or mathematical functions of various kinds then we need to figure out the way these deep priors like the pipeline, if you found pipeline the priors you need to be for how they interoperate and talk to each other. The only way for mainstream frameworks to stay relevant within all this is if they can keep a very high velocity, maintain main stream frameworks, so very large and complex pieces of software. And they are being worked upon by lots of people. So if there's a change in the field and they don't keep up fast enough, they eventually will die. So the only way they can actually keep up is they maintain a very high velocity. And there will be specialized tooling that comes in all the time because specialized tooling that is more niche, more specific, doesn't have the baggage that comes with moving slow. So they can just move faster, they can be more efficient, they won't have the advantages of full vertical integration.

Soumith Chintala: And so if mainstream frameworks do move faster then they will just be able to kill specialized tooling over time. So the last words I wanted to leave with my talk is in science progress is a combination of having great ideas and having the tools to execute those ideas. If either one is stuck in a local minima then all progress will stop. So let us continue to make progress by being open to both new ideas and new tools. Thank you.

Speaker 3: Joining us next is Andy Fang from DoorDash. Andy is the co-founder and CTO of DoorDash. As CTO, Andy is responsible for overall product vision, technology roadmap, and architectural direction of DoorDash. Andy and the DoorDash team are actually skills downstairs neighbors in San Francisco. So pre COVID, we used to run into each other all the time in the elevator. Andy, we're so sad not to get to see you more often. But thank you so much for joining us today.

Andy Fang: Hi, everyone. I'm Andy Fang. I'm the co-founder and CTO at DoorDash. I'm excited to be here today at Scale Transform to talk with you all about how DoorDash uses AI to power our marketplace. I'll first start off by giving you all a quick introduction of DoorDash. And then I'll quickly dive into two different case studies of how we apply AI here to power our marketplace. Starting off with our founding story, so DoorDash, we were founded in 2013 out of a Stanford dorm room. The founders were driven by a mission to empower local merchants. After interviewing hundreds of local businesses in the Bay Area. And we talked to laundromats, toy stores, hair salons, restaurants, you name it. A common theme that we came across was that offering on demand delivery for merchants would alleviate them a logistical headache while funneling them more customer demand.

Andy Fang: To test this idea, we launched PaloAltodelivery.comm as our minimum viable product, a website with PDF menus, and a Google Voice number that afforded to all of our personal cell phone numbers. That's where it all began. Everything was manual, routing was done by one of us on the spreadsheet while all of us were on a group call together, we also use find my friends to track each other's locations while we were out doing deliveries. Fast forward to today, we rebranded as DoorDash. And today DoorDash services over 20 million customers, over 450,000 merchants, with a million plus dashers fulfilling deliveries on the platform. We've also become the number one delivery player in both the restaurant food and convenience verticals in America. And we're currently in the United States, Canada, and Australia and looking to expand further globally.

Andy Fang: One of our early mantras was do things that don't scale, which was evident in all the manual techniques we use to power deliveries in the early days. However, as we grew We ultimately had to scale with exponential growth and we've since automated many workflows. Now we saw this as an opportunity not to only automate, but also to think about how we could apply AI techniques to do things better than we could perform manually. DoorDash processes millions of calculations per minute to determine how to optimally service all three sides in the marketplace, you have consumers, the dashers, as well as the merchants. I'll deep dive into two particular problem areas where we've applied AI to further our business and better service our constituents. Starting off with our first case study, creating a rich item taxonomy. We have over tens of millions of restaurant items in the DoorDash catalog and tens of thousands of new items are added every day, most of which have unique taste profiles that we need to differentiate.

Andy Fang: Even for the same type of food item, let's say a chicken sandwich. Some customers would prefer a chicken sandwich from McDonald's, while others would prefer a chicken sandwich from Chick-fil-A. Not to mention all the options one can customize to the item, like adding lettuce, mild, spicy, really spicy, etc. In order to help customers find what they want, we need to be able to understand item characteristics at this lower level of detail. Merchandising and product teams want to be able to create curated editorial experiences like best breakfast options near you, or game night finger foods. Strategy teams may want to know if we have enough healthy food options available in the market to determine their sales strategy. Or let's say a customer searches for pad Thai but there's no available nearby options, we might want to understand what dishes with similar characteristics we can suggest instead. We can build specialized models for each of these tasks mentioned above. But that would take too much time to quickly test new ideas.

Andy Fang: Enter building a rich taxonomy to help us solve this problem. Now, in order for us to build this rich taxonomy, we decided to approach these calstar and scaling problems by looking at all the tags we're interested in and then building models to automatically tag every item in our catalog according to this taxonomy. We integrated these models into a human in the loop system defined as a model that requires human interaction, allowing us to collect data efficiently and substantially reduce annotation costs. Our final implementation was a system that grows our taxonomy as we add tags and uses our understanding of the hierarchical relationships between tags to efficiently and quickly learn new classes. Critical things we needed to consider for how to define annotation tags. One, making sure that there's different levels of item tagging specificity that don't overlap. So let's say for coffee, you can say it's a drink, you can say it's non-alcoholic, and you can say it's caffeinated. Those are three separate labels that don't overlap and categorization with each other.

Andy Fang: Second, allow annotators to pick other as an option at each level. Having other is a great catch all option that allows us to process items that were tagged in this bucket to further see how we can add new tags to enrich our taxonomy. Third, make tags as objective as possible. You want to avoid tags like popular or convenient. Things that would require subjectivity for an annotator to determine. Now we can leverage the tags that we developed towards developing a high precision and high throughput task. High precision is critical for accurate tags, high throughput is critical to make sure that our human tasks are cost efficient. Our taxonomy naturally lends itself towards generating simple binary or multiple choice questions for annotation with minimal background information. So you can still get high precision using less experienced annotators and less detailed instructions, which makes annotator onboarding faster and reduces the risk of an annotator misunderstanding the task objective.

Andy Fang: Now, you can see this example here that with a fried chicken sub there's different kinds of options that we gave an annotator to label it, and whether it's being a sandwich or burger, or whether it's vegan or not. Now we want to talk about how we set up the human in the loop system. And so basically what we did is we had the annotations feed directly into a model. And as you can see in this diagram here, where the steps with human involvement are in red and automated steps are in green. This loop allows us to focus on generating samples we think will be most impactful for the model. Not to mention, we also have a loop to do QA on our annotations, which makes sure that our model is being given high quality data. Through this approach, we've been able to almost double recall while maintaining precision for some of our rares tags, leading directly to substantially improved customer selection. Pictured here is an example of the difference between the old tags where only items labeled literally with the word desert will return. And the new tags were a query for desert can be expanded with query understanding and can walk down our taxonomy so that we can do beyond simple string matching.

Andy Fang: You know, as opposed to the initial query, which only indexes item with a keyword dessert, we're able to select far more items that we actually consider to be desserts without modifying the search algorithm itself. We're also able to power use cases such as screening for restricted items. 21 plus, for alcohol, for example, relatively easily. Pictured here is a sample of items our model recognizes as alcohol. Now I want to talk through a second case study which is creating an optimized delivery menu. For a restaurant creating an online experience on DoorDash the online menu is the main way to attract customers. Since the menu is a main online touch point an unattractive or poorly organized menu can have a huge negative impact on our merchants online conversion rate regardless of the food quality. If a merchant does not design his menu correctly, customers won't be as attracted to their online offerings and won't buy as often. In order to succeed online merchants need to utilize a set of menu building practices to attract new customers. Empowering local merchants is the DoorDash mission. So we strive to help merchants best present themselves to the customer base of DoorDash.

Andy Fang: In order to surface the characteristics that make for successful online menus, we utilize AI to analyze thousands of existing menus on our platform. We then translated these characteristics into a series of hypotheses for AB tests, we saw a huge improvement in menu performance from experiments involving header photos and more info about the restaurant. And we also intend to conduct further experience about how to add different information to further improve menu performance. While staff at a restaurant can help sell the menu by crafting a story around the item, or giving live recommendations, quite frankly, this can't happen online. For example, customers might not be familiar with a particular dish and they would require much more information to glean themselves on the online menus, using photos or descriptions to really make the leap of faith and order it. Customers also need help when parsing an online menu to make their decision and we want to make this experience as easy as possible. We want to remove friction such as cutting out on popular items, adding more detailed labels or description to explain menu items and also clearly categorizing menus to make the menu easy to navigate.

Andy Fang: We define a successful menu as one with a high conversion rate at the end of the day. To build set of features for this kind of model, we looked at each layer have a menu from the high level menu appearance to detail modifiers for each item. For each layer we brainstorm features relating to key elements such as menu structure, how customize the more the items are, visual aesthetics, like brand pictures, things of that nature. Different merchants structure their menus differently. For example, take a look at these Chicago based restaurants. Duck Duck Goat keeps to a minimalistic but comprehensive selection of categories including mini combos. Ippolito's Pizzeria on the other hand has a very long menu, kind of following a traditional Italian menu structure complete with popular item photos, header image, and a logo. And lastly, Big Star has a very simple menu showing two main categories for food. You got the classics and the tacos and you have a consistent photo layout.

Andy Fang: Another thing to consider is just how customizable these items are, what kind of modifiers are available? While some merchants provide a running list of options available for each item such as for 88, other menus keep the option list short to limit the online menu and to simplify preparations. To find out which menu features were the most influential in menu conversion, we use the features mentioned above as inputs to regression models predicting menu conversion. We built our initial regression models using linear regression and base tree models to achieve a baseline error, while the results there were interpretable the error rate was pretty high. And also on top of that many of the features seem to be correlated which led to an issue of co-linearity, which made it difficult to determine how changes in each feature impact of the target variable directionality.

Andy Fang: The lack of being able to explain this clearly was a pain point for us and is generally a pain point for blackbox models in general. To solve this problem we use Shapley values, which is a game theoretical approach towards model explainability. Shapley values represent the marginal contribution of each feature to the target variable and are calculated by computing the average marginal contribution to the prediction across all permutations before and after withholding that feature. So after examining the resulting Shapley plots of the final model, as you can see here, the top success factor was number of photos on the menu. And this is particularly important for the top items in the menu as photo coverage of the top items appears much more prominently in the menus overall appearance. Some other top factors and recommendations we made to merchants to help make their menus perform better, one of which is giving higher customizability for items. We found that customers enjoy optionality within the top items and ability to customize provided a degree familiarity that they could find while dining in.

Andy Fang: Another factor was menus with a healthy mix of appetizers and sides also converted better. This provides customers with more choices to complete their meal and can lead to higher carb values for merchants as well. While top factors that lead to successful online menus aren't surprising, we also have to know that there are variances by cuisine type. This variance is a classic case of where the averages can be deceiving since the average doesn't really represent a normal distribution. And also menus are not all the same. By better understanding customer's expectations around certain types of restaurants and cuisines, merchants are able to better build and customize their menu to basically be more truthful to their brand and the food that they're serving.

Andy Fang: The clearest example of customer expectations at work was actually what we observed with Chinese menus. Unlike most other menus, Chinese menus that were long and had a ton of items actually performed better. When customers dine in at Chinese restaurants a long and complicated menu, sometimes you even have regional specialties, indicates authenticity. Take a look at this menu. Customer expectations about an authentic Chinese menu leads to these complicated menus having greater conversion rates. On the other hand, menus for wings and pizza places tended to be shorter and well photographed and also have a ton of options for customizing. You can think about it where when customers visit these types of merchants they usually have one to two items in mind or are just looking for things to customize like sauces for wings or toppings for pizzas.

Andy Fang: To wrap it up. These are just some of the ways we're using AI to scale our marketplace and to make it easier for customers to find what they want, and also make it easier for merchants to position themselves in an online world. If you're curious to learn more about these two particular case studies, or you're curious to read about more case studies, feel free to check out our DoorDash engineering blog. It's just DoorDash.engineering. It was a pleasure to speak with all of you today. I really hope you enjoyed and took something out of it and enjoy the rest of the conference.

+ Read More
Terms of Use
Privacy Policy
Powered by