Creating Personalized Listening Experiences with Spotify

Name: Creating Personalized Listening Experiences with Spotify
Uploaded: 2021-10-06

Posted Oct 06, 2021 | Views 6.1K

# TransformX 2021

# Keynote

SPEAKER

Oskar Stal

VP of Personalization @ Spotify

Oskar is Spotify’s VP of Personalization. He leads product, engineering and research to provide the world’s most engaging programmed music and podcast experience. Throughout his +10 year tenure at the company, he’s introduced large scale machine learning technologies to hyper-personalize the listening experience for more than 365 million users worldwide, be that through Spotify’s home, search or personalised playlists like Discover Weekly or Daily Mix. Oskar started programming at the age of 12, a passion for technology he derived from the Commodore 64 back in 1986. As a teenager he toured with the C64 to various competitions and gatherings for computer geeks. Even though the long hair has been cut, the C64 still comes out every now and then. Oskar holds a Master of Science in Computer Science from KTH Royal Institute of Technology and lives in Stockholm.

+ Read More

SUMMARY

Oskar Stal, VP of Personalization at Spotify, shares how Spotify ensures each of their 365 million listeners has a personal and unique experience as they each explore and enjoy the music they love. In this keynote, Oskar shares Spotify's evolution in machine learning from a single recommendation feature that engineers worked on in their spare time, to the wide-scale deployment of multiple personalized content recommendations, for each Spotify user. He shares how Spotify uses numerous sources of data to create a personal experience. Oskar also discusses how Spotify uses reinforcement learning to maximize short- and long-term recommendation objectives to build long-term user engagement. What makes a good content recommendation? How should you optimize recommendations to build a lifetime of loyal membership? How can you use simulation to train better recommendation models? Join this session to learn Spotify uses AI to build personalized listening experiences.

+ Read More

TRANSCRIPT

Nika Carlson (00:00): (music).

Nika Carlson (00:15): Next up, we're thrilled to welcome Oskar Stal. Oskar is Spotify's VP of Personalization. He leads the product, engineering and research to provide the world's most engaging programmed music and podcast experience. He's introduced large-scale machine learning technologies to personalize the Spotify experience for more than 365 million users worldwide. Oskar started programming at the age of 12, a passion for technology he derived from the Commodore 64 back in 1986. Oskar holds a Master of Science and Computer Science from KTH Royal Institute of Technology, and lives in Stockholm. Oskar, take it away.

Oskar Stal (01:02): Hello, everyone. My name is Oskar Stal and I'm the Head of Personalization at Spotify. This means that I lead hundreds of engineers, product managers, and researchers who are all working on content personalization. Some of you may have experienced some of our work, maybe you've gotten some nice recommendations that you really liked, maybe some serendipitous discoveries that have helped you find something new.

Oskar Stal (01:30): Over the years, recommendation at Spotify has become some of our special magic touch. Today, I'm going to talk a little bit about how that all works and also talk a bit about the next chapter on this mission. First of all, let's take a look at some context. What you're looking at now at the screen is the Spotify company mission. As you can see here, we're focused both on creators and listeners at the same time. We view this as a symbiotic relationship. One can't really succeed without the other.

Oskar Stal (02:10): And we're also doing this at enormous scale. The way we think about it, we're trying to balance two sides of a table. On one side of the table, there are 365 million users, and on the other side, you could think about 70 million tracks or 2.9 million podcasts. But there isn't really just one experience. We think about it as 365 million different experiences. One experience for each user. And it would take a person 1,475 years to go through all of those 200 petabytes of content. We're also recording around half a trillion of events every day. You could see how this is a good place for machine learning.

Oskar Stal (03:01): But our journey into machine learning wasn't always that obvious. I've been with Spotify for 12 years, and in the beginning, it wasn't clear that machine learning was something of interest to Spotify. Some of you may remember when Spotify was launched in 2008, that Spotify was mostly about access to music. You would go in, you would find your artist, your album, create your playlist. But pretty early on, we started to experiment with collaborative filtering because we thought it would be interesting to see what we could do there.

Oskar Stal (03:36): The challenge we had at that time was really how to do some kind of an approximation algorithm that could operate at our scale. It was really the approximation, the math behind doing collaborative filtering at all that was our initial challenge. And we managed to get over it, come up with some interesting ways to get it done, and that became our first fingerprinting of users' tastes and created some first experiences.

Oskar Stal (04:09): Today, matching content and users is at the core of Spotify. And you're faced with this from the start, as you start the app. You may come into Spotify and see our start page, or you may come in to search for content, but you're faced with personalization right from the start as you start the app. And obviously, we're using many different machine learning techniques today, which means that we can serve even the narrowest of tastes. And personalized playlist is really a key part of what we do.

Oskar Stal (04:46): One example I want to talk about is Discover Weekly. This really started out as a hack week project, and it was a couple of engineers at Spotify who thought it would be a great idea, and they worked on it on their spare hack time. Initially, leaders at Spotify didn't pay much attention as the product was slowly growing momentum inside of Spotify and more and more employees found it and liked it and loved it. After some time, we started to realize that maybe this is something interesting that we need to do more with, and it became a real project and something that we started to invest in.

Oskar Stal (05:27): When we first launched Discover Weekly, we were very positively surprised by the response we got, and today, all of our playlist, we do as somewhere around 16 billion artist discoveries every month. And all of these playlists we create are unique to each of our 365 million users. And today, we have many, many playlist, not just Discover Weekly. We have those that focus on discovery, we have others that focus on your various habits. It could be time capsule or on repeat, for example. We have social playlists that focus on you and your friends and how you can listen together. We have mode and interest playlists, we have mixes, we also have some playlist that mix podcast and music together.

Oskar Stal (06:16): Sometimes, we talk about an approach that we have called "Algetorial" inside of Spotify, and what we really mean is using editors together with machine learning to create really good experiences. And one way you could think about it is that an editor is creating a script or a manuscript of an experience that they want to do, that they can then hand off to the machine learning algorithm and give every user their own individual version of that playlist.

Oskar Stal (06:49): One of the things that we figured out quite early on was that we really wanted to build a lot of different features that were personalized. And to do this, it was important that we can do them in a fast way and in a scalable way and in a repeatable way. So, we had to really think about how can we create many different personalization features in a quick and good way. That's why we thought about these three layers in an architecture. These layers really helped to reuse and quickly stand up like a new feature.

Oskar Stal (07:29): First, let's take a look at the bottom layer here. This is where we think about data. There is three types of data you could think of. Obviously, user data like playlists, what are you listening to, what are you clicking on. There is content data, data that we get either from the web or from the content providers, things that they send to us. And then, the third category is really the audio profiles, maybe the audio profile of a song or a podcast, et cetera, et cetera.

Oskar Stal (08:04): One thing that we've been investing in a lot the last couple of years is the instrumentation of our clients. Basically, how can we understand how our users are interacting with our clients and interacting with our content? Let's say that you want to build a new feature, like a new screen in the mobile client. We have a framework where you will go in and define specifically all the events and all the different interactions that will happen on this screen.

Oskar Stal (08:34): On this slide, you can see the middle phone here is showing that framework and you can see all the various events that has been defined for this one. And then, as you do that, the framework will create code snippets for you that you can then copy and put into your code. And then, as you start to run your thing, data events are emitted that match back to the screen that you're building, so to speak. And by doing this, it becomes really easy to do detailed instrumentation of your new interface, that can be reused by others.

Oskar Stal (09:19): We also do some automatic linking of this event. We create automatic ideas that makes it easy to link all of these events together. In this example, you can see someone's looking at their home screen, they're clicking on a playlist, and then from that, they're going to an artist page and then to an album page, and then they're listening to a couple of songs.

Oskar Stal (09:42): And by having these ideas, we can link this all the way back. That song you're listening to, we can link back to that initial home session that you had. And by doing that, we can understand the downstream implications of the recommendations you do. And as I'm sure you can realize, that's really valuable when you're trying to do recommendation algorithms.

Oskar Stal (10:08): At Spotify, everyone does data production. It means we have a very rich, but also very fragmented data situation going on. Lots of different data sources. To make the world a bit easier, we have something called Golden Datasets. Golden Datasets are most strategic data that we really want to be easy accessible. They basically have to adhere to a specific SLO. As a user of a Golden Dataset, you know what to expect, you know that it will be defined when it's delivered, when it's calculated, you know that you could trust that the quality of it, et cetera.

Oskar Stal (10:50): And they're also published in a portal that we call Backstage. This is an open source tool that Spotify has open sourced that makes it really easy to access, find and understand the details of these Golden Datasets. These Golden Datasets really ensure that it's easy for you to do a quality pipeline. You can trust that the data coming out is going to be there for you when you need it and that it really works for you.

Oskar Stal (11:17): Then, let's take a look at the middle part of things. Here, we have a lot of shared machine learning models that basically give you information that is very usable for many different use cases. Some examples are users affinities to various things. How much do you like this artist, how much do you like this album, how much do you like this playlist? It could also be similarities. Are these two artists similar? Give me five artists that are similar to this artist, or give me five playlists that are similar to this playlist. It could also be clustering. Here are 20 artists, tell me how they belong together?

Oskar Stal (12:00): And as you can imagine, all of these various APIs are handy and useful for a lot of different features that we're trying to build. Here's one example. We basically have this massive embedding space that is created from collaborative filtering. You can think about this as fingerprints for everything we have. Artists, tracks, listeners, playlists, anything you can imagine has a fingerprint in this embedding space.

Oskar Stal (12:28): And that is then very useful to do queries, like what are your favorite artists, which are playlist that you may like, are these two artists similar, are they not similar, how do we cluster artists together, do these four makes sense together, et cetera, et cetera. This is a very good tool that is reused for a lot of different features.

Oskar Stal (12:52): And at the top layer are the features themselves, Discover Weekly or search or our start page, for example. Here we have machine learning models that are created for that specific use case. These models will be using data from the shared models and from the data as input, but they are optimizing for different use cases, different goals and different things. Here we can take a look at an example. Basically, as input, we're thinking about what is the intent, is this for a party or is this for a workout, is the user on mobile or is the user on desktop?

Oskar Stal (13:33): We think about generic things, like what is the time of day, is it weekend, is it weekday, for example? But of course, we also think about things like your taste. What do you like, what do you listen to, what do you not like, et cetera? All of that will go into the specific model, we're crunching it and out comes five tracks that we think make perfect sense for you to listen to next. Then, we'll be recording your interactions, like how do you react to these five tracks, what do you do with them via the instrumentation that we discussed earlier.

Oskar Stal (14:07): We record all of that, and then we retrain the algorithm every day so that it gets a little bit better at understanding what it's trying to do. And the target here will be different for different use cases. Is the target about discovering a new track or a new artist, or is it more about you relaxing with familiar music, et cetera? That's going to also vary from use case to use case.

Oskar Stal (14:30): Now, let's switch gears and move into some more difficult questions. What is a really good recommendation? A few years ago, we did some research on this topic, and as you can see here on the slide, we were looking at things like relevance, popularity, et cetera. What we found here was, surprise, surprise, relevance is really important. If we want to recommend something, it should be relevant to the user. But what was maybe more surprising and interesting is that popularity didn't matter so much. Whether the track is popular or not is not so important.

Oskar Stal (15:11): However, diversity matters a lot. Basically, what the science showed in this particular research is that a relevant, diverse piece of content is really the best recommendation that we can give you. Now that we know that it's not just about getting another stream or getting another recommendation, what do we do with that? How can we rethink or think deeper about what a good recommendation is, because this topic is a big priority for us right now. Let's take a bit of a closer look at that work and see how we are approaching that important question.

Oskar Stal (15:48): The way that we want to think about it is that we're not just optimizing for the current moment. We're not optimizing for the thing that is most likely to get clicked, not optimizing for the thing that is most likely to get streamed, or optimizing for just driving more listening time in the current moment. Instead, we want a healthy journey for a lifetime of fulfilling content. We want to think about Spotify as a membership service.

Oskar Stal (16:24): The most important decision you make is not, "Do I listen to one more track? Do I listen to one more podcast?" The most important decision happens at the end of the month when you decide, "Do I keep paying for Spotify or do I keep being a member? Do I want to continue to listen to Spotify next month?" That to us is the important decision point that we're trying to optimize for.

Oskar Stal (16:49): This then ends up being a lot about balancing things. One critical answer is how to create this balance in a good way. Some examples will be balancing familiar with the new. Do you want to hear your old favorite or do you want to hear something you never heard before? The recommendations, should they be obvious, like songs that are fairly obvious to you, or should they be totally unexpected, serendipitous recommendations that will take you to a new place? For example, should we be suggesting new genres or new topics, or should we stay safe and recommend within the existing genres that you really like?

Oskar Stal (17:32): Maybe you joined Spotify for dance music, but can we help you focus while studying? Essentially, by balancing these polarities, can we help you get a more fulfilling content diet? In practice, how could we help the user on this journey? We think about it as a journey of discovery. And we, at Spotify, we're doing a series of recommendation actions to take you on this journey of discoveries. And we have to think about your wants, things that keep you in the comfort zone, things that are the tracks you know, the tracks you love, and your needs, things that may enhance your listening down the line, but may not be exactly what you expect right now.

Oskar Stal (18:22): For example, maybe it is a good idea that on your morning walk, you start listening to some news, for example. And those will give you delayed rewards because now you're more happier for the future, but that may not be so obvious in the moment right now. The end result we're trying to get to here is how can we augment the listener experience by balancing these two in a really good way?

Oskar Stal (18:48): Let's think about audio and all the audio content out there as a vast landscape with peaks and valleys. This abstract audio landscape is a place where the user is wandering around, and let's think about a higher altitude as being more satisfied, a greater experience, and a lower altitude being less satisfied. How can we, Spotify, help you find a pathway through this landscape that will bring you to a higher altitude?

Oskar Stal (19:26): For example, should you walk straight ahead, listen to some of your favorite music and maybe, around the corner, there is a hill that will take you to a much higher altitude and a much better place? Or should we take you up the higher path, the more difficult path, giving you some discovery tracks and you walk up the stairs, and maybe that will lead you to a better place, a higher altitude? We think about this as Spotify nudging the user through our recommendations actions, trying to help you find the best path through this landscape, to your higher altitude.

Oskar Stal (20:06): Therefore we think this is a really good case for reinforcement learning. And lets me tell you a little bit more about that. Here's what I mean. We at Spotify started in 2008 working with Bandits. You can think about Bandits as the simplest form of reinforcement learning, basically reinforcement learning without any states. Can we build on that to build on what we just discussed earlier? Now, we're thinking about the user as having a state, a state that represents your relationship with audio content, your relationship with Spotify. Are you really happy, you think it's amazing, or are you just listening a little bit once a week and are not so super excited about it?

Oskar Stal (20:55): And we are then applying a series of actions through recommendations, trying to change your state and find a pathway through all audio content that will lead you to this better place. It's like being in a maze and trying to optimize for that long-term reward of finding your way out of that maze. Here, you can see the reinforcement loop from the books.

Oskar Stal (21:21): On the left hand, you can see things like the actions that we take as Spotify, like what do we recommend, what do we play, which track are you going to hear next. You see the observations you make. You remember, we talked about the instrumentation and everything that we see that you're doing with experience. We also have the rewards that the user give to us. Maybe you save a track, you like a track, or you do something explicit about a track.

Oskar Stal (21:50): On the right side, you see the algorithm or the recommendation algorithm itself. And here, we're really trying to maximize the future accumulative rewards. Basically, what do we think will be the accumulated rewards that we're going to get from you through our actions essentially nudging the user? And this is an approach then to try to optimize for a journey to a more fulfilling content diet. The goal is really to maximize that future sum of rewards that we're collecting from you as a user. And if you think about this loop spinning around here, you can imagine how we can constantly learn how to find a better path, or learn how to be better, essentially become better pathfinders for all of our 365 million users.

Oskar Stal (22:41): One of the things we've been doing is that we've been looking at our users, and essentially, we've been looking at specifically those users who already seem to be doing a great job at this themselves. They're looking around, they're finding, they're discovering, they're having a great time, and we can see how they are, over time, getting more and more happy with Spotify, listening to more and more content. And they are, through their own means, improving their satisfaction with Spotify. Analyzing them, what can we learn from them, what do they do, where do they look, how do they discover, et cetera?

Oskar Stal (23:16): Another thing we're doing is also trying to predict how satisfied are you as a user. And the approach we've been taking here is that we're modeling state changes for users. You can see here at the slide, a couple of various states that the user can be in. And then, we built survival models that essentially create probabilities for users to move between these various states. We input everything we know about user, like how they're listening and when they're listening and all of those types of information, and we output various probabilities of moving between these states.

Oskar Stal (23:55): And since we have such a massive volume of data from all of our 365 million users, we can get fairly good accuracies on these various probabilities. You could think about this as creating a function where we can input everything we know about a user, their listing patterns, and output roughly how satisfied they are by looking at their probabilities for moving between these states. This is a way for us to create a model where we can understand your current relationship with the audio content that you're consuming on the Spotify platform.

Oskar Stal (24:38): And then, from that, we have to create a good state space or good states that we can use in the reinforcement learning algorithm. Again, your state should represent where you are on this journey. Are you super satisfied, everything is great, you're listening to a lot of different music, high diversity, or are you listening only to five party tracks on Friday afternoon? All of that, we want to have represented in this state.

Oskar Stal (25:11): Essentially, we're here trying to build an encoder system that can encode your state into embeddings. And the goal here is to find embeddings that are really sensitive to both the actions that we can take and change us in that satisfaction that we talked to earlier. If there's an action we can make that will make some change there, that needs to be well represented in this state space that we're building.

Oskar Stal (25:37): We've also built a simulator, and this is a simulator that simulates how the user reacts when we show them certain contents. You can imagine us saying, "We're going to play these five songs to the user," and the simulator can then say, "Okay, this is how the user will interact, react. These are essentially the events or the interactions that the user will make." And what's really good here is that this simulator doesn't have to be able to run in production. It can take in, as input, a lot more data that will be hard to gather in a production live setting. And it can also be a much more complicated model that will be hard for us to compute at scale in a production setting.

Oskar Stal (26:21): And then, what we do here is that we take the agent, or basically the recommendation algorithm, and train it against the simulator. You can imagine it like a chess player. We take our chess playing algorithm, playing chess against the chess simulator many, many times so it gets better and better at figuring out ways to win in chess. And it's the same for us here, but here, the goal is to find pathways through the recommendation that lead to better satisfaction. Again, the agent is playing and playing and playing against this simulator so that it learns how to do good recommendations.

Oskar Stal (27:02): And what we then do is that we use A/B tests. Basically, we're testing a lot of different agents on our real users to see how they're doing. And what's really interesting here then is that we try to basically compare agents that are trained against really good simulators with agents that are compared with less good simulators. What we want to see obviously is that if your agent is trained on a good simulator, it does really well in a real A/B test with real users. And if it's trained against the so-so good simulator, it does a little bit less well. We've been really excited to see this working with real users in A/B tests. It's very exciting and promising for us.

Oskar Stal (27:48): Okay, to try to summarize a little bit in the end, what are we really trying to do here, what are we going for? We're really trying to build a more connected, holistic system. Instead of having a fleet or transactional systems that are all operating in their little own space, we're trying to connect them together so we can have a holistic approach to recommendation. And we're trying to transition over to a more farsighted approach to content recommendation. We really want to optimize for a long-term, fulfilling content diet, rather than a click or a stream.

Oskar Stal (28:29): And what we're hoping here is by doing that, we can increase the amount of meaningful audio that you have in your life. Maybe when you're out walking, doing the dishes, vacuuming, running, whatever you're doing, how can you have more meaningful audio in those moments? At the same time, we're trying to help the creators to get more value for the content they create. Obviously, if they create something that you're enjoying, they're going to get more value for that.

Oskar Stal (29:02): And by having this loop spin, we really hope that our listeners can have a bit more fulfilling day generally by just having more interesting and good audio in their life, AKA the best Spotify for you. We're very excited about this next chapter and recommendation, and thank you all for taking the time and listening to our story. Bye bye.

+ Read More