Scale Events
+00:00 GMT
Articles
November 22, 2021

Inside Spotify's Content Recommendation Engine

Inside Spotify's Content Recommendation Engine

The music streaming service has access to data at a scale that most companies can't match

Inside Spotify's Content Recommendation Engine

While most people understand Spotify as a music engine—a way to play their favorite tunes on demand and share them with others—few are fully aware of the vast data and machine learning capabilities that power this system. The fact is, Spotify is more than just a music streaming service; it’s a powerful content recommendation engine that learns from billions of daily actions by users.  

Oskar Stål, vice president of personalization at Spotify, talked about his company’s multilayered recommendation process, which he called part of the firm’s “special magic touch,” at the recent artificial intelligence (AI) and machine learning (ML) conference Scale TransformX.


Spotify’s mission is twofold, he said: Allowing creative artists to live off their work, and allowing fans to enjoy and be inspired by these creations.

Recommendation at scale

Data is at the heart of every ML-powered application. While this is universally true, Spotify has access to data at a scale that most companies cannot match. As Stål pointed out in his talk, Spotify reaches over 365 million daily active users across 178 global markets, and the numbers are growing. It has more than 70 million music tracks and over 2.9 million podcasts, constituting 200 petabytes of content, and its library grows every day.

Users of the platform perform over half a trillion collective actions every day, such as searching for music and creating playlists, and the service tracks it all. “You could see how this is a good place for machine learning,” Stål said. “We think about it as 365 million different experiences,” Stål said, “one experience for each user.”

Access to such rich data sources gives Spotify the opportunity to perform collaborative filtering at an unprecedented scale. Collaborative filtering is a fancy term to describe a simple concept: Users who share similar features are likely to have similar tastes. In other words, users A and B might be predicted to have similar tastes in music if they share similar demographic information, listen to similar artists, listen to a number of common playlists, listen at similar times of the day, and so on.

The goal of a collaborative filtering algorithm is to tease apart these different factors in an individual’s taste and select those that are the most predictive for recommending content. In addition, Spotify has the mission of making personalization central to its platform’s experience and putting it front and center on every user’s homepage.

Spotify users see the results in things such as the Daily Mix playlist, which constantly updates to bring them new content that Spotify thinks they will enjoy listening to. This helps Spotify’s recommendation engine facilitate 16 billion artist discoveries per month.

Multiple data sources offer competitive advantage

Spotify has access to a number of data sources that are a big part of the company’s competitive advantage. These sources include content data about songs and artists that comes from the web and third-party providers: artist biographies, song metadata, and more.

Another source is the raw, waveform audio profile of a song. Waveforms are tricky to work with from an ML perspective, but they can give a lot of insight into elements of a track such as dynamics, tempo, instrument profile, and more. These elements can be encoded as input features to an ML model that can give it more context when it comes to making recommendations and decisions.

Finally, Spotify’s most important and proprietary dataset comes from users’ daily actions, many of which are recorded and warehoused in order to give greater insight into user behavior and preferences. Of course, because the key idea underlying collaborative filtering is that similar users will behave similarly and have like tastes, Stål noted, capitalizing on it requires having rich data about all users to guide recommendations.

Spotify makes a series of high-quality, production-ready datasets, called “golden datasets,” available to engineers across different development teams (see Figure 1). At Spotify, “everyone does data production,” Stål said. “It means we have a very rich, but also very fragmented, data situation going on.”


Figure 1. Spotify creates high-quality, production-ready datasets, called “golden datasets,” that are available to engineers across different development teams and are published in an open-source portal called “Backstage.” Image source: Spotify

Because Spotify sees its golden datasets as its most strategic data, it wants them to be easily accessible, Stål said. “As a user of a golden dataset, you know what to expect; you know that it will be defined when it's delivered, when it's calculated; you know that you can trust the quality of it.”

Common models and tools

Spotify doesn’t just share its data across different development teams. It also provides a core suite of ML tools and models that can be used by engineers across the organization, even by those who are not ML experts. One way the company does this is to use embeddings.

An embedding is a low-dimensional vector representation of an object or entity. At Spotify, the most important embeddings are those of users, but songs, artists, and more get their own embeddings as well. Embeddings are created by an ML model and are designed to encode key information about a user into a single, useful representation.

“You can think about this as fingerprints for everything we have,” Stål said. “Artists, tracks, listeners, playlists—anything you can imagine has a fingerprint in this embedding space.”

So, for example, two teenage girls from the United States who often listen to Miley Cyrus and Taylor Swift will likely be embedded much closer in space than either would be to a 62-year-old man from Great Britain who listens primarily to classic-rock bands such as the Beatles and the Rolling Stones.

It is easy to see how a collaborative filtering model takes advantage of these pairwise similarities to recommend artists such as Demi Lovato, Alessia Cara, and Selena Gomez to the first two users while recommending artists such as Badfinger and the Kinks to the other user (see Figure 2).


Figure 2. A hypothetical visualization of the Spotify musical artist embedding space. Image source: Spotify


This goes to another point. Embedding models provide a natural way to cluster users, or the other objects on which they are based, into similar groups. This can be used for making recommendations, or for other operations such as a k-nearest neighbors search.

For example, an algorithm could take in as input the entire space of embeddings, along with a user, and find the 10 other users who are closest to that user in the embedding space. The algorithm could then aggregate all the tracks that those other users are currently listening to and create a discovery playlist based on that information. (This is not necessarily how Spotify’s discovery algorithm works in practice, but it is one way that embeddings can be used.)

Once again, Spotify makes embeddings and other ML models available to all engineers across the organization, allowing them to incorporate personalization into every aspect of the Spotify platform.


Evaluation of recommendations

Spotify puts a great deal of emphasis on evaluating how its ML and personalization efforts affect the day-to-day user experience, Stål said. The key question the leadership and engineering teams ask themselves is, “What is a good recommendation?”

Obviously, there are a lot of answers to this question, but Spotify has found over the years that relevance is highly important—that is, content that is recommended to a user should be relevant to that user in some way, Stål said. Maybe it meshes particularly well with the user’s mood or state of mind. (Spotify’s many playlists for things such as sleep, energy, and focus are examples of this.) Or it might aid them in what they are undertaking at the moment, such as throwing a party or powering through a workout.

“We're not just optimizing for the current moment. We're not optimizing for the thing that is most likely to get clicked, not optimizing for the thing that is most likely to get streamed, or optimizing for just driving more listening time in the current moment,” Stål said. “Instead, we want a healthy journey for a lifetime of fulfilling content” so that subscribers choose to continue to pay each month for the service.

Another principle that Spotify’s recommendation algorithms operate on is that listeners like diversity. Many recommendation engines often get into feedback loops, recommending variations on the same type of content over and over. It’s as if the recommendation algorithm has gotten stuck. The algorithm has identified some particular artist or niche that you like and has found that recommending them to you will pretty consistently garner clicks. Spotify has gotten unstuck.

Reinforcement learning

One way Spotify avoids getting stuck is by using reinforcement learning, which Stål described as a type of “nudge” to guide users through the various recommendations (see Figure 3). “We are applying a series of actions through recommendations, trying to change your state and find a pathway through all audio content that will lead you to this better place. This is an approach, then, to try to optimize for a journey to a more fulfilling content diet.”


Figure 3. On the left are the actions Spotify takes: what it recommends, which track users will hear next, and so forth. Rewards are what users “give” to the service, which could include saving or liking a track. On the right is the recommendation algorithm itself, which is primarily about maximizing cumulative future rewards. Image source: Spotify


Stål said that one key principle on which Spotify’s recommendation agent operates is to learn from exploratory users, those who are frequently searching for new content and expanding their musical horizons. It looks at how these users are acting: Which artists and tracks are they searching for? Which radio stations are they listening to? Which playlists are they following?

Aggregating this information across all exploratory users, Spotify can then define transition probabilities between environment states. For example, if a user is listening to a radio station based on Bon Jovi, Spotify can calculate the probability of that user switching to the John Mellencamp radio station. From there, the agent can take actions in this simulated environment that results in a model of how exploratory users behave. That model can then be used to inform recommendations to other users. The goal is to develop a probabilistic model of trajectories over states to understand how users derive long-term satisfaction from the content they listen to.

From there, Spotify can validate different reinforcement learning agents that it trains using A/B testing and then deploy the best ones to production.

The ultimate goal

Spotify is much more than just a music streaming platform, Stål said. It has developed an entire ethos around providing quality recommendations to users. Personalization is at the core of its DNA, and it wants to give its users listening experiences that are adapted to each one’s unique needs.

The goal is to keep users satisfied over the long term so that they continue to subscribe to the service and engage with the platform.

Ultimately, as Stål explained, Spotify wants to “increase the amount of more meaningful audio in your life.” ML is at the heart of this strategy, allowing the company to learn from the vast amounts of data it has to architect effective recommendation systems at scale.

Learn More

For more details about Spotify’s recommendation engine, watch Stål’s talk, “Creating Personalized Listening Experiences with Spotify,” and read the full transcript here

Dive in
Related
29:39
video
Creating Personalized Listening Experiences with Spotify
Oct 6th, 2021 Views 7K