Machine learning at Spotify: You are what you stream

AdSense

Vinyl
Vinyl (source: Pxhere.com)

In this episode of the Data Show, I spoke with Christine Hung, head of data solutions at Spotify. Prior to joining Spotify, she led data teams at the NY Times and at Apple (iTunes). Having led teams at three different companies, I wanted to hear her thoughts on digital transformation, and I wanted to know how she approaches the challenge of building, managing, and nurturing data teams.

I also wanted to learn more about what goes into building a recommender system for a popular consumer service like Spotify. Engagement should clearly be the most important metric, but there are other considerations, such as introducing users to new or “long tail” content.

Here are some highlights from our conversation:

Recommenders at Spotify

For us, engagement always comes first. At Spotify, we have a couple hundred people who are just focused on user engagement, and this is the group that creates personalized playlists, like Discover Weekly or your Daily Mix for you. We know our users love discovery and see Spotify as a very important platform for them to discover something new, but there are also times when people just want to have some music played in the background that fits the mood. But again, we don’t have a specific agenda in terms of what we should push for. We want to give you what you want so that you are happy, which is why we invested so much in understanding people through music. If we believe you might like some “long tail” content, we will recommend it to you because it makes you happy, but we can also do the same for the top 100 track if we believe you will enjoy them.

Music is like a mirror

Music is like a mirror, and it tells people a lot about who you are and what you care about, whether you like it or not. We love to say “you are what you stream,” and that is so true. As you can imagine, we invest a lot in our machine learning capabilities to predict people’s preference and context, and of course, all the data we use to train the model is anonymized. We take in large amounts of anonymized training data to develop these models, and we test them out with different uses cases, analyze results, and use the learning to improve those models.

Just to give you my personal example to illustrate how it works, you can learn a lot about me just by me telling you what I stream. You will see that I use my running playlist only during the weekend in early mornings, and I have a lot of children’s songs streamed at my house between 5 p.m. and 7 p.m. I also have a lot of tango and salsa playlists that I created and followed. So what does that tell you? It tells you that I am probably a weekend runner, which means I have some kind of affiliation for fitness; it tells you that I am probably a mother and play songs for my child after I get home from work; it also tells you that I somehow like tango and salsa, so I am probably a dancer, too. As you can see, we are investing a lot into understanding people’s context and preference so we can start capturing different moments of their lives. And, of course, the more we understand your context, your preference, and what you are looking for, the better we can customize your playlists for you.

Related resources:

Powered by WPeMatico

eBay