The digital world has brought unprecedented convenience and connectivity but also raised significant concerns about data privacy. As we share more of our lives online, the need for robust privacy-enhancing technologies has become paramount. On-device learning has emerged as a powerful tool to protect personal data while enabling advanced capabilities. In this blog, we will explore on-device learning, its role in enhancing privacy, and how it’s used.
On-device learning, sometimes referred to as federated learning, is a machine learning approach that allows training models directly on a user’s device with data available on their device. Only updated model parameters are sent to a remote server or cloud. This means that a user’s smartphone, tablet, or other device can learn and adapt to their preferences without constantly sending their data to remote servers. This gives users more control over their data, protects their privacy, and reduces the need to send raw individual user data to external servers.
On-device learning operates with the following four principles:
With on-device learning, online retailers can gain insights on consumers’ preferences and behaviors without tracking their individual preferences. The way this works is, each consumer’s device downloads the current model, improves it by learning from the data on their phone. The model updates from each of these devices are then collected, compiled, and are fed back into and improved on the central model. Thus, the marketers just learn the overall purchase pattern or behavior without ever learning individual consumer preferences or behaviors.
Let’s look at a real-world example of a data collection sequence that uses on-device learning:
On-device learning is not perfect from a privacy perspective. When model parameters leave users’ devices they still leak information about the underlying local training data. So, the risk of sensitive information being shared is only reduced and not completely eliminated.To prevent this, on-device learning is often combined with other PETs such as differential privacy and secure computation, which we will cover in different posts on our blog.
In today's data-driven world, concerns about privacy and data security have never been more critical. k-Anonymity is a privacy concept and technique that plays a pivotal role in safeguarding sensitive data. Let’s explore what k-anonymity is and how it‘s used to protect personal information.
k-Anonymity is a privacy model designed to protect the identities of individuals when their data is being shared, published, or analyzed. It ensures that data cannot be linked to a specific person by making it indistinguishable from the data of at least 'k-1' other individuals. In simpler terms, k-anonymity obscures personal information within a crowd, making it impossible to identify a particular individual.
The 'k' in k-anonymity represents the minimum number of similar individuals (or the “anonymity set”) within the dataset that an individual's data must blend with to guarantee their privacy. For example, if k is set to 5, the data must be indistinguishable from at least four other people's data.
To implement k-anonymity, data must be generalized to make it less identifiable, while ensuring that each data point is identical to a minimum of ‘k-1’ other entries. This is commonly done through two methods:
Online retailers use k-anonymity to protect customer data while analyzing purchase histories and preferences to enhance their services and recommendations.
For example, individual users can be associated with data cohorts based on their interests on their mobile device. An advertiser can then target individuals in specific cohorts. This way, the advertiser does not learn any personally identifiable information (PII) and only learns that a specific individual belongs to certain cohorts. And as long as the cohorts are k-anonymous, they protect users from re-identification, especially for large values of k.
A drawback to using k-anonymity is that sometimes revealing just the cohort a user belongs to can leak sensitive information about a user. This is true, especially when the cohorts are based on sensitive topics such as race, religion, sexual orientation, etc. A simple solution to this problem is to use predefined and publicly visible cohort categories, such as in Google Topics.
In any case, cohorts can still be combined or correlated and used to re-identify users across multiple sites. That said, k-anonymity is often combined with other privacy protections to further reduce the probability of re-identification.
As people spend more and more time online, consumers have demanded more control over their digital privacy. They’ve become particularly uncomfortable with digital tracking technology like third-party cookies that enable marketers to gather information about their browsing behavior. But eliminating third-party cookies puts marketers in a tough spot. Their businesses have relied on cookies to find new customers for over two decades.
Government agencies in the US and Europe have responded to consumer demands by enacting regulations that offer more protection and control to users over how their data is collected and processed. And many of the web browsers have already phased out cookies. Google has been the last hold out and they’re expected to fully phase out cookies by the end of 2024.
But simply eliminating cookies won’t solve the privacy protection problem for consumers. Digital footprints are always expanding and companies need to be more vigilant than ever about protecting their customers’ data. There’s an enormous opportunity to build an ad ecosystem that respects users' privacy more than ever.
Privacy Enhancing Technologies (PETs) have emerged as a crucial ally for safeguarding consumer data. This emerging technology uses advanced cryptographic and statistical techniques to protect consumer information while still allowing marketers to glean valuable insights.
PETs are a set of tools and methods designed to help organizations maintain digital privacy. They provide a layer of defense against unwanted surveillance, data breaches, and unwarranted data collection by enhancing user control and safeguarding data during its lifecycle. PETs are instrumental in upholding privacy, security, and freedom in the digital realm.
There are several types of PETs being used throughout the digital advertising ecosystem:
PETs will play a vital role in creating an advertising ecosystem that is primarily privacy focused. Optable is exploring the use of multiple types of PETs as we build a privacy-safe environment where clients can safely collaborate with their data partners. The following blog series will demystify the complex world of PETs and take a closer look at how advertisers are using them.
At Optable we view interoperability first and foremost through the lens of digital advertising’s critical systems. And when you consider the systems used for ad campaign planning, activation, and measurement, you quickly realize that these systems were all inherently interoperable for a long time thanks to widespread data sharing. With identity and data sharing on their way out for a variety of reasons, new ways of interoperating within each of these systems are required. Clean rooms are a way to achieve data interoperability in advertising, and that’s why we have invested significantly in this area.
But, the trouble with clean rooms is that both parties have to agree to use the same one in order to interoperate. The central idea with clean room technologies is that two or more parties come together around a neutral compute environment, enabling them to agree on operations to perform on their respective datasets, on the structure of their input datasets, on the outputs generated by the operations and, importantly, on who has access to the outputs. Additionally, various privacy enhancing technologies may be used to limit and constrain the outputs and the information pertaining to the underlying input datasets that is revealed.
So, what does true interoperability look like for data collaboration platforms, built from the ground up for digital advertising? Here are three important pillars:
✅ Integration with leading DWH clean room service layers. A DWH clean room service layer is the set of primitives (APIs and interfaces) made available by leading DWHes (Google, AWS, Snowflake, etc), that enables joining of disparate organization datasets, and purpose limited computation. Optable streamlines this by automating the flow of minimized data to/from DWHes, and by federating code to these environments. The end result? A collaborator with audience data sitting in Snowflake can easily match their audience data to an Optable customer's first party data, all within Snowflake using Snowflake DCR primitives to enable trust, without the Optable customer lifting a finger. In this example the matching itself happens inside of Snowflake, but the same thing can be done with other DWH clean room service layers as well.
✅ Compatibility with open, secure multi-party compute protocols like Private Set Intersection (PSI). What if your partner wants to match their audience data with you but they cannot move their data into a cloud based DWH? SMPC protocols such as PSI enable double blind matching on encrypted datasets, without requiring decryption of data throughout. Open-source implementations provide an independently verifiable, albeit purpose constrained clean room service layer. The end result? A collaborator with audience data sitting on premise can execute an encrypted match with an Optable customer using a free, open-source utility.
✅ Built-in entity resolution, audience management and activation, with deep integration to all major cloud and data environments. In the real world, few organizations have all of their user data assets neatly connected in a single environment. Sure, they exist, but more often than not, organizations need to do quite a bit of work to gather, normalize, sanitize, and connect their user data so that they can effectively plan, activate, and measure using data collaboration systems. It’s therefore no wonder that when the IAB issued their State of Data report earlier this year, respondents cited time frames of months up to years to get up and running with clean room tech! Moreover, even when one company has got their user data together, their partners often require help with entity resolution. These are the reasons why Optable makes it easy to connect user data sitting in any cloud environment or system into a cohesive and unified user record view, out of the box, with no code required. Got part of your user data in your CRM? And another sitting in cloud storage? And another in your DWH? No problem.
At Optable, we believe that these pillars are the groundwork on top of which interoperability can happen, and we’re partnering with industry peers who share the same vision. Stay tuned for more exciting announcements on this front!
One of the most common misconceptions about data clean rooms and data collaboration is that there are requirements on having tons of identified data.
Most publishers we meet have this concern: “Do we really have enough data to drive significant revenue? Won’t we be limited by the size of the match, and therefore won’t be able to run any media at scale?“
Typically they are surprised to learn that mitigating low volumes of identified data is part of the solutions offered today by this class of data collaboration technology.
No matter how little identified data any given publisher has, they can benefit from growth using data collaboration technologies. The reason is quite simple: any campaign is better off when it starts with real data.
Following a match with an advertiser, the publisher has a few options: one, a simple one, is simply to have insights on the matched audience. The publisher can better understand the brand’s customers or prospects as a function of their own data, which in turn allows them to create better media products. It also shows the brand that the publisher reaches the right audience for them. Insights are offered as a report that provides aggregate numbers – by definition, it is a privacy-safe product.
The second, and an important one, is the possibility of creating a prospecting audience out of the match. Optable’s prospecting clean room app automatically creates an expanded audience that provides scale, performance and value when it comes to reaching the right audience. Not only that, but we do it in a privacy-safe manner, since the publisher does not learn the intersection – only the prospecting audience becomes eligible for targeting.
Considering that a publisher’s audience consists of both identified and unidentified users who share a number of traits, Optable prospecting clean room app allows a publisher to configure a model that ultimately creates an addressable audience that is sizable enough to drive significant growth.
For brands, the use of customer or prospect data also doesn’t have a limiting factor – in fact, there are few brands that can boast having significant data on all their customers. For everyone else, the objective is to have some data – enough to allow our systems to make better audience decisions.
We make publisher-driven data collaboration easy for all parties: our end-to-end solution includes direct integration for activation straight from the clean room environment, and offers frictionless interoperability.
Given the emergence of retail media and the democratization of data through data warehouse clean room APIs, data collaboration is quickly becoming a major revenue opportunity.
Forward-looking publishers who are looking for revenue growth must prioritize future-proof, privacy-safe solutions to driving revenue.
Canadian news and journalism outlets have entered into a fierce battle with Google and Meta over the recently enacted Bill C-18, also known as the Online News Act. This legislation, passed by the Canadian government on June 22, 2023, aims to support the Canadian journalism ecosystem by establishing a tax that "digital news intermediaries" such as Google and Meta must pay to the content owners they link to.
In a familiar pattern observed in similar laws like Australia's News Media and Digital Platforms Mandatory Bargaining Code, Meta and Google have retaliated by removing links from their platforms including Instagram, Facebook, and Google Search. Unfortunately, this response undermines the very essence of the bill and is expected to inflict financial harm on Canadian journalism. While Google and Meta argue that they only seek a fair market share for their services, publishers contend that this is unjustified since Google and Meta generate billions in advertising revenue while journalists struggle to make ends meet.
The dynamics at play here are further complicated by the fact that media agencies and brands, responsible for a significant portion of news media revenues, control advertising spend. This advertising spend is the primary source of revenue for Google & Meta, which famously represent 80% of online advertising revenue in the country.
Traditionally, Canadian brands and their agencies have allocated the majority of their advertising budgets to these two companies. However, there is a growing trend, driven by recent legislation and broader shifts in advertising, to directly invest media dollars with local publishers. Many agencies and brands have committed to supporting Canadian publishers in light of this impasse. For example, the A2C in Quebec has already taken steps to incentivize collaboration between agencies, brands, and local publishers. Some agencies view this issue as a matter of ethics and social responsibility. Prominent figures in the agency world, like Sarah Thompson, President of Dentsu Media and Brian Cuddy, SVP Responsible Media Solutions at Cossette have been vocal advocates for supporting Canadian news publishers. In response to the announcement from Facebook that all Canadian news will be removed from their platforms within weeks Sarah took to her LinkedIn to share support for local news “We are at a moment of time where action is required to support local owned media, which is more than news.”
In addition to developments within the Canadian ecosystem, there are emerging trends in how marketers allocate their paid media budgets. Advertising executives are increasingly interested in investing more heavily in contextual advertising and leveraging publishers' first-party data for better targeting. There is also heightened scrutiny around programmatic channels, which lack transparency in terms of media ROI. Consequently, there is a growing preference for direct buying. Moreover, measurement strategies are shifting away from the digital attribution focus of the past decade towards more traditional methods, such as brand lift analysis, media mix modeling, third-party audience measurement, and the use of consumer research data and studies.
In essence, these trends indicate a change in the attitudes and choices of CMOs and agency leaders. They are actively supporting a more open and equitable internet through their advertising investments.
Similar to other legislations, it is probable that Google and Meta will have to pay millions of dollars directly to media owners to avoid taxation. However, the process of finalizing these deals will require time, leaving publishers to suffer from decreased traffic and increased competition with these tech giants for ad revenue. In the long run, there is a possibility that Google and Meta might modify their platforms by completely removing links. The economic landscape has evolved for these companies, and it is not unreasonable to consider their initial link removal as a test to assess long-term effects on user engagement and potential revenue.
To minimize risk, publishers can take proactive measures to future-proof their businesses.
Here are some recommendations:
Canadian publishers are witnessing promising support from agencies, brands, and the public, indicating a positive trajectory. Coupled with the growth of future-proof data collaboration technologies, this presents remarkable opportunities for news media publishers to revolutionize their advertising revenue generation. The Online News Act, a legislation that foreshadows the future of news consumption, holds great significance not only for Canadians, but also for Americans, as similar bills have reached Congress. In the midst of these advancements, we find ourselves at a critical juncture for the open internet, journalism, and democracy as a whole. Numerous Canadian publishers have already partnered with Optable to safeguard their advertising businesses, and for those who haven't, we are prepared to provide our assistance!
Today the IAB Tech Lab is publishing version 1.0 of the Open Private Join and Activation (OPJA) clean room interoperability standard. Throughout the past year, together with a growing number of industry collaborators and members of the Tech Lab’s Privacy Enhancing Technologies (PETs) and Rearc Addressability working groups, our team played a leading role in developing OPJA with the goal of enabling interoperable privacy safe ad activation based on PII data.
Beyond our work on the initial proposal, we have several broader goals with OPJA:
While we think that there is room for clean room vendors and collaboration platforms to offer their own proprietary spin on the activation use case (many already do), we’re hoping that they will make an effort to evaluate and align their implementations to better adhere to OPJA, and we intend to make it easy for them to do so.
In order to achieve our goals, agreeing on an independently trustable manner in which user data can be matched and activated in the multi clean room vendor setting was imperative.
Doing this work in the open is essential, as it ensures that it is widely accessible and that any vendor can contribute ideas and review the proposed protocols and technologies. Open-source promotes transparency, collaboration, and inclusiveness in the development process. We believe that providing a common foundation that anyone can access, modify, and contribute to is essential to achieving interoperability between all vendors, instead of a select few.
We decided to focus our initial interoperability standards efforts on the activation use case not only because it is a frequently encountered use case in industry, but also because we have noticed confusion regarding the extent to which user information is exchanged between parties that enable the use case in proprietary ways today.
On the surface, activation of overlapping audiences matched using a clean room is straightforward. Consider the case of an advertiser with a list of customers that wants to display ads to those customers when they are interacting with a publisher’s websites or applications. If users have provided personally identifying information, such as their email address, to both the publisher and advertiser directly, then the advertiser and publisher can compare datasets in a clean room in order to construct an audience of overlapping users. Here’s a Venn diagram illustrating the operation:
While seemingly simple on the surface, when it comes to the sharing of information associated with individual users, there are several subtle but material differences that may arise when such an operation is performed in practice. Notably, what new user information could the advertiser and publisher parties learn as a result of performing the match and targeting operation? Will the advertiser be able to track which of its individual customers are also browsing the publisher’s websites? And will the publisher learn which of its registered users are also the advertiser’s customers?
To answer such questions, a standard set of security and privacy design goals, input and output requirements, and clear documentation regarding the extent to which private user information is exchanged between parties when enabling the ad activation use case were all elaborated and made part of the OPJA specification. Ultimately, our goal with OPJA is to enable ad targeting on overlapping users without the parties leaking user information to each other. This is not only good for end user privacy, but it also prevents data sharing that could be exploited by competitors.
A defining characteristic of clean rooms is their potential to limit the scope of the processing of user data controlled by multiple parties. A simple example of this in practice is the construction of an aggregate report describing the intersection of two audiences originating from separate parties. In such a report, the joining, grouping, aggregation, and statistical noise injection can all be performed in a data clean room, thus preventing either party from learning anything about the other party’s data, other than what is included in the prescribed report.
This limiting capability of data clean rooms is inherent in the activation matching operation prescribed by the OPJA specification. In OPJA, a secure match is performed in order to determine which individual users are in the intersection of audiences originating from an advertiser and a publisher. Rather than the list of matched users being shared with either party, the presence or absence of each user in the intersection is encoded in the form of a label and is then encrypted. These encrypted user labels are shared with the publisher who cannot decrypt them, but who is able to insert them into ad requests. Ad requests are processed by ad tech (SSPs and DSPs), and only the advertiser’s designated DSP can decrypt corresponding match labels, enabling the DSP to make decisions on whether and how much to bid for the opportunity to show an ad. Critically, PII such as email address or phone number are never shared or transferred in ad requests, or outside of the match operation.
Equally important is that thanks to label encryption, OPJA allows the hiding of information about which individual users are in the audience intersection from both the advertiser and the publisher. This reduces data leakage between advertisers and publishers, and enables remarketing without requiring user tracking. Fundamentally, it’s an approach that adheres to the data minimization and purpose limitation principles of privacy by design.
OPJA outlines two approaches enabling the matching of user PII data in the multi-vendor setting, and they’re both based on Privacy Enhancing Technologies (PETs). The first is a purely software based, delegated private set intersection. This method enables the comparison of encrypted datasets using commutative encryption, without decrypting the data. The delegated helper server cannot decrypt the match data and is used merely to execute data comparison and generate encrypted data for activation. Additional trust in the helper server could be provided through hardware provided remote attestation.
The second approach is based on hardware provided Trusted Execution Environments (TEEs). This method ensures that match data is encrypted exclusively for the secure processing hardware provided by a helper server.
The use of PETs offers a robust foundation from which trust between vendors regarding how user data is matched can be achieved. OPJA matching requires that the data remains protected with encryption during processing, through a combination of cryptography software and TEE hardware. This greatly reduces the number of things that vendors and service providers need to trust each other with.
OPJA’s matching approaches are also not theoretically limited to a single cloud or infrastructure environment. These characteristics make PETs based approaches great as matching interoperability candidates in the multi-vendor setting.
You can read the OPJA specification as well as the IAB Tech Lab Data Clean Room Guidelines here. Additionally, here's the Tech Lab's latest announcement on the 1.0 spec release.
For a fun introduction to OPJA, check out Digiday’s excellent WTF is IAB Tech Lab’s Open Private Join and Activation?
For a simple walkthrough on how commutative encryption can be used to enable double blind matching (not specific to OPJA), have a look at the little explainer here.
If you’re a data or ad tech vendor (SSP, DSP, ad server) interested in interoperating with the Optable data collaboration platform using OPJA, we’d love to hear from you. Drop us an email.
Finally, it’s our hope that OPJA is a catalyst for future open proposals associated with measurement, audience modelling, and other use cases that involve the sharing of sensitive user data between advertisers and publishers.
As third-party cookies and other public personal identifiers die a slow death, digital advertising is crying out for a new and better approach to replace them. To fill the void, data clean rooms are increasingly being recognised as the answer, with advertisers seeking a solution that delivers actionable data within a privacy-safe framework.
However, their success, like the efficiencies and scale they offer, depends on their ability to link systems, platforms, and partners. This makes delivering a frictionless approach critical for the growth of data clean rooms.
So why are data clean rooms seen as the saviour of digital advertising? It’s because they can power the whole advertising process, from delivering insights to campaign planning, activating, targeting, and measuring.
At the heart of a data clean room approach are partnerships: between advertisers, publishers, data providers and vendors. Ensuring successful partnerships can flourish requires frictionless collaboration that works on two levels. Firstly, the partnership level, with publishers and advertisers coming together to work with whoever they choose and linking seamlessly in a secure, privacy-complaint environment.
Secondly, the technology level, by allowing data clean rooms to connect easily with any relevant third-party platform. After all, companies want to avoid frustrations around time-consuming, bespoke implementations or the inability of platforms to interact every time they seek to engage in a partnership.
So while security, privacy and compliance are central to data clean rooms, they must remove the barriers that stifle this collaboration.
That’s because there are huge inherent benefits in creating data clean rooms that allow parties to collaborate directly, irrespective of the platforms they use. It’s why we’ve focused effort and resources from the start on building our technology so users can create data clean rooms then quickly execute partnerships by inviting businesses to connect without the other party needing to be a user of our platform. Real value is gained by offering an easy solution for achieving this platform-agnostic collaboration.
Ultimately, any technology must be an enabler that eradicates complexity. For data clean rooms, this comes down to interoperability.
Ensuring platforms can connect and communicate with each other removes the friction around companies cooperating. Whether it’s integrating with other data clean rooms or collaborating with partners without their data having to leave their data warehouse, interoperability marks the next phase of building a frictionless environment.
The whole point of interoperability is that it requires consensus. Time and again in ad tech, fragmentation and silos have hindered the ability of a market to evolve and prosper. However, we’re seeing the industry begin to tackle this with the IAB Tech Lab currently working on drafting a set of data clean room standards. This is a recognition of the importance of data clean rooms and represents an essential starting point for interoperability, because it’s through agreed standards that this becomes possible.
With more businesses looking to test data clean rooms, central to their success is ensuring it’s easy to establish partnerships and that technologies are mutually compatible. Ultimately, taking down the barriers, removing friction, fostering collaboration, and embedding interoperability means data clean rooms can fulfil their potential and support the future needs of the digital advertising industry.
To work the way they should, data clean rooms need to bring a fluid, real-time, embeddable infrastructure to data collaboration. And at the heart of such an offering, there needs to be an API that allows any client to deploy the data clean room approach across any inventory, any type of audience data and any third-party cloud provider.
In this way, any third-party application or platform should be able to benefit from a data clean room by embedding its API for secure, privacy-preserving data collaboration.
This in turn enables a complete digital media workflow via API, and taking Optable’s service as an example, it looks like this:
One of the best applications of a data clean room API is in combination with a customer data platform (CDP). An API can be used to properly leverage audience data housed in a CDP, making this data actionable for activation and measurement with third parties.
Another good example involves walled garden data and inventory. Whether it’s for CTV, audio or traditional web formats, an API can be used to effectively drive advertiser performance anchored in real customer data.
Ultimately, the API is here to make it easy to leverage the data clean room approach in any third party platform or application.