Домой United States USA — software Unpacking How Ads Ranking Works at Pinterest

Unpacking How Ads Ranking Works at Pinterest

118
0
ПОДЕЛИТЬСЯ

Aayush Mudgal describes how Pinterest serves advertisements. He discussed in detail how Machine Learning is used to serve ads at large scale. He went over ads marketplaces and the ad delivery funnel, the ad serving architecture , and two of the main problems: ads retrieval and ranking. Finally he discussed some of the challenges and solutions for training and serving large models.
Aayush Mudgal, Staff Machine Learning Engineer at Pinterest, presented at QCon San Francisco 2023 a session on Unpacking how Ads Ranking Works at Pinterest. In it he walked through how Pinterest uses deep learning and big data to tailor relevant advertisements to their users.
As with most online platforms, personalized experience is at the heart of Pinterest. This personalized experience is powered through a variety of different machine learning (ML) applications. Each of them is trying to learn complex web patterns from large-scale data collected by the platform.
In his talk, Mudgal focused on one part of the experience: serving advertisements. He discussed in detail how Machine Learning is used to serve ads at large scale. He then went over ads marketplaces and the ad delivery funnel and talked about the typical parts of the ad serving architecture, and went into two of the main problems: ads retrieval and ranking. Finally, he discussed how to monitor the system health during model training and wrapped up with some of the challenges and solutions for large model serving.Content Recommendation
Mudgal first presented the characteristics of a content recommendation system. Every social media platform has millions or billions of content items that it could potentially show to users. The goal is to find items that are relevant to a particular user, but since the content catalog and user base are so large, a platform like Pinterest cannot precompute the relevance probability of each content item for each user.
Instead, the platform needs a system that can predict this probability quickly: within hundreds of milliseconds. It must also handle high queries-per-second (QPS). Finally, it needs to be responsive to users’ changing interests over time. To capture all of these nuances, platforms need to make sure that the recommendation system solves a multi-objective optimization problem.
When a user interacts with a particular element on a platform, they are often presented with a variety of similar content. This is a crucial moment where targeted advertisements can come into play. These ads aim to bridge the gap between users’ and advertisers’ content within the platform. The goal is to engage users with relevant content which can potentially lead them from the platform to the advertiser’s website.
This is a two-sided marketplace. Advertising platforms like Pinterest, Meta, Google help to connect users with advertisers and the relevant content. Users visit the platform to engage with the content. Advertisers pay these advertising platforms so that they can show their content so that users engage with it. Platforms want to maximize the value for the users, the advertisers, and the platform.Advertising Marketplaces
Advertisers want to have their content shown to the users. It could be as simple as creating an awareness for that brand, or driving more clicks on-site on the platform. When they do this, the advertisers can also choose how much they value a particular ad shown on the platform.
Advertisers have the option to select from two main bidding strategies. One approach allows advertisers to pay a predetermined amount for each impression or interaction generated via the platform. Alternatively, they can set a defined budget and rely on the platform’s algorithms to distribute it optimally through automated bidding processes.
Next, the advertisers also choose their creative or image content. Before serving the creative, the advertising platform needs to define what’s a good probability score for deciding to serve  this particular content to a user. This could be defined as a click prediction: given a user and the journey they are taking on the platform, what’s the probability that this user is going to click on the content?
However, maximizing clicks might not give the best relevance on the platform: it might promote spammy content. Platforms sometimes also have shadow predictions such as «good» clicks, hides, saves, or reposts that are trying to capture the user journey in a holistic way. On some platforms, there may be more advertising objectives like conversion optimization, which is trying to drive more sales on the advertiser’s website; this is challenging to capture, as conversion happens off the platform.
Also, suppose the platform wants to expand the system to more content types, like videos and collections. Not only do they need to make these predictions that are shown here, but they also need to understand what a good video view is on the platform.
Finally, the different platform surfaces also have different contexts. This could be a user’s home feed, where the platform doesn’t have any context or relevance information at that particular time, or a search query where the user has an intent behind it.
Given this complexity, as the platform scales it needs to make sure that it is able to make all these predictions in a performant way. Some of the design decisions that are taken here also cater to support scaling and product growth.Ads Serving Infrastructure
Mudgal then presented a high-level overview of the ads serving infrastructure at Pinterest. When a user interacts with the platform, the platform needs to fetch content that it wants to show to the user. The user’s request is passed in via a load balancer to an app server. This is then passed to an ad server which returns ads that are inserted into the user’s feed.
Figure 1: Ads Serving Infrastructure High Level Overview
The ad server needs to do this in a very low latency manner, around hundreds of milliseconds, end-to-end. The input to the ad server is typically rather sparse: a user ID, the user’s ip address, and time of the day, for example.
The first task is to retrieve features for this user. This could be things like the user’s location from their IP address, or how this user has interacted on the platform in the past. These are usually retrieved from a key-value store where the key is the user ID and the values are the features.
Once this system has enriched the feature space, these are then passed into a candidate retrieval phase, which is trying to sift through billions of content items trying to find the best set of candidates to find hundreds or thousands of candidates which could be shown to the user. Then these are passed into a ranking service, which uses heavyweight models to determine the user’s probability of interaction with the content across multiple objectives (click, good clicks, save, reposts, hides).
This ranking service also typically has access to feature extraction, since the system cannot transmit all the content features in a candidate ranking request performantly. Typically, hundreds to thousands of candidates are sent into the ranking service, and sending all of those features together would bloat the request.
Instead, these features are fetched through a local in-memory cache (which could be something like leveldb), and to ensure maximized cache hits, could utilize an external routing layer. Finally, the ranking service then sends the ads back to the ad server.
In most traditional machine learning systems, the values of the features that are used to show that ad through a particular time are very important to train machine learning models. In addition to the synchronous request to fetch these features, there is also an asynchronous request that’s sent to a feature logging service which logs them. Also, to make the system more performant, there are fallback candidates: if any part of the system fails, or is unable to retrieve candidates, fallback candidates can be shown to the user so that the user always sees some content on the platform.

Continue reading...