On January 24, Google released a new Privacy Sandbox proposal for public discussion, the Topics API. While the new Topics API breaks from the bird themes of 2021, it replaces the FLoC proposal as Chrome’s proposed solution to target advertising to users based upon the general interests of the user. It is still in the very early stages for this solution and much is subject to change (or be completely replaced) but it’s good to know some of the basics.
Let’s dive into the Topic 😉
What is the Topics API?
Google Chrome’s new proposal to address the use case of personalizing advertising based upon the general interests of a user without the use of a user identifier such as third-party cookies. It is a mechanism to enable interest-based advertising without having to track the sites a user visits.
How does it work?
Simple Answer
The Chrome browser assigns a topic classification for each web page visited by a user. The list of available topics is human-curated and open for anyone to view to ensure no sensitive categories are included. At the end of a set time period, currently defined as one week, the five most frequently visited topics are assigned to the user and stored in the browser. This happens each week, with the three most recent week’s topics being stored.
When the user then accesses a new web page containing ad space, the browser can provide three of the topics assigned (one from each defined time interval) to ad tech platforms embedded on the page responsible for displaying advertisements. This allows for advertisements displayed to the user to be personalized based upon the topics of interest as determined by their past browsing history. All of this is done without the use of a personal identifier.
Technical Answer
As outlined in the Topics API explainer, there are three core tasks to be completed:
- Map a website hostname to topics of interest
- Calculate the top topics for a user based upon recent browsing activity
- Provide topics of interest to ad tech platforms to enable the selection of appropriate ads
The proposed solution to determine the topics of interest is to use in-browser machine learning (ML) to infer topics from the hostnames of pages visited by the user. As currently proposed, the ML model would be a classifier model that would be distributed with the browser so as to be freely available and openly developed.
The browser would infer topics based upon the hostnames of the pages visited over a defined period of browsing activity (named an epoch in the proposal). This is currently proposed to be one week. At the end of each epoch, the browser would compile the list of pages meeting the following criteria:
- The page was visited during that epoch
- The page included code that called the Topics API method
- The Topics API was enabled (not blocked by the user or page)
From that list, the on-device ML classifier would run and map the hostnames to the list of available topics from the topics taxonomy. The top five most frequently visited topics would be stored for that epoch with the most recent three epochs being saved.
At this point, three groups (one for each of the most recent three epochs) of five topics (determined by frequency of access) are stored in the user’s browser.
When a user then accesses a new page with ad inventory, the ad tech embedded on the page can call the Topics API to return information about the topics of interest for the user. When the call is made, the Topics API will return an array of three topics—one from each of the three groups (epochs) stored. Which topic from each epoch that is returned will be randomly determined by the browser. In addition, there is a 5 percent chance the returned topic will be randomly selected from the full topics classification taxonomy to further reduce the ability to identify the user. Currently proposed to be included in the array is a number value used to identify the topic within the full taxonomy list, the taxonomy version with which to do the lookup, and the classifier version for the classifier ML model used to infer the site topics accessed.
An important caveat to this process is that API callers can only receive topics they’ve already observed from the user. The goal of this provision is to not share information with more entities than is currently possible with third-party cookies. An API caller is considered to have observed a topic if it has called the Topics API in code included on a site which the Topics API has mapped to the topic in question. Here the full Topics API explainer provides a good example of this logic. To try and provide a further simplified scenario example:
- Chrome maps hostname kitties.example to the topics taxonomy entry “cats”
- Code from adtech.example is included on kitties.example website
- Topics API method must be called on the page from adtech.example, so the adtech.example call would need to load via an iframe
- A particular user then visits a page on kitties.example website
- Adtech.example code calls Topics API method on the visited page
- One of topics inferred in the browser for kitties.example is “cats”
- Adtech.example in this case would be considered to have observed the topic “cats” for that specific user
- Now, when the user visits othersite.example (also with adtech.example embedded on their site) and adtech.example calls the Topics API to return the topics for the user, if “cats” is present it would be observable by adtech.example and thus used for purposes of personalizing the advertisement displayed
With this caveat in place, it is possible that the ad tech platforms on a site would have not “observed” any topics previously for that user. In this case no topics would be returned by the Topics API. It will be interesting to see how this aspect of the proposal evolves considering the UK’s Competition and Markets Authority’s involvement as it obviously provides an advantage to ad tech platforms with wider reach which are embedded on a large number of websites.
Why is this replacing the FLoC proposal?
Following the FLoC origin trials in 2021 there was a lot of feedback from privacy groups and advertising technology companies. This proposal aims to address many of the concerns in the following ways:
Reduce the “fingerprinting” surface
One of the main concerns with FLoC was that the cohort IDs provided could be used as an additional input used to identify a user (thus “widen the fingerprint surface”). The Topics API aims to address this by using a smaller number of coarse-grained topics (currently 350, expected to end up between a few hundred and a few thousand), randomly return topics from the user’s top 5, provide a completely random topic 5 percent of the time, limiting the frequency at which a site can “learn” a new topic from the user, assign different topics for the same user across different sites, and limiting the topics accessible based upon past topics observed.
Address sensitive topics
The FLoC proposal relied upon a machine learning process to group similar users into cohorts based upon browsing behavior. A concern was that this process could group users based upon a sensitive topic such as ethnicity or sexual orientation. Topics API addresses this by making the list of topics available human-curated and fully public. It can be ensured through this process that any sensitive topics are removed from the taxonomy. In addition, both sites and users can opt-out of the API so as not to be included in any of the browser logic.
Additional user controls
As mentioned, the Topics API will give opt-out abilities to both the user as well as a website. A user opting out would not have topics stored nor made available based upon their browsing history. A site opting out would ensure that their webpages would not be considered in the topics definition process for any of their website users. In addition to these rights, the usage of human-readable topic classifications makes it easy for users to understand the topics that have been assigned to them. A user-friendly UX in the browser can then be developed to allow them to remove topics from the list or clear the list altogether. This additional transparency gives the user more intuitive control of how they are targeted with advertising.
What is the next step?
I can’t emphasize this enough, this is still in the very early stages of development only currently in public discussion. A testable solution is expected to enter origin trials in the middle portion of 2022. At this stage, it is important for advertisers to understand the core concepts and weigh in on the public discussion with feedback if you think there are requirements which are not sufficiently considered.
What are the implications from the advertiser perspective?
As currently proposed, the Topics API would help to enable interest-based ads personalization for more granular targeting. From the advertiser perspective this can help maintain one of the primary methods of personalized ad targeting today. This proposal, however, preferences ad tech platforms with the ability to actually access the topics which are stored for the user. The advertisers and publishers with whom the user is actually interacting are unlikely to have direct visibility to the user’s topics due to the “past topic observation” requirement. This would leave advertisers at the mercy of their ad tech vendors for these general user interest insights. This leaves a lot to be desired from both the user and advertiser perspective.
There is a lot more to come with this as well as other Privacy Sandbox proposals in the works. We will be watching this closely at InfoTrust as things progress and we get nearer to planned third-party cookie deprecation in 2023.