Re: [proposals] Interoperable Private Attribution (IPA) (#2) from benjaminsavage via GitHub on 2022-02-15 (public-patcg@w3.org from February 2022)

From: benjaminsavage via GitHub <sysbot+gh@w3.org>
Date: Tue, 15 Feb 2022 06:35:52 +0000
To: public-patcg@w3.org
Message-ID: <issue_comment.created-1039909389-1644906950-sysbot+gh@w3.org>

A few comments in response the the thread so far:

1. On the topic of "what if no match keys are set"

As @eriktaubeneck said - I like the idea of the device just generating a random matchkey. That way the API seamlessly defaults to "same-device-only" attribution, which is at least at par with other proposals.

2. On the topic of "Who can use a match key once it is set?"

The reason we proposed allowing any company to benefit from match keys set by any other participant, was specifically to try to avoid any kind of system which could be abused by large established players. As @[santirely](https://github.com/santirely) mentions, this would give them a lot of leverage to, as he says, choose to only share access with businesses who "play well with them". We opted for an "open reference" proposal specifically to avoid this type of risk.

3. On the topic of: "What incentive would a company with a large footprint have for setting an open-reference match key?"

As @eriktaubeneck points out, browsers and mobile operating systems are rapidly clamping down on "tracking". Various regulations are doing the same. This means that all businesses (even those with a large footprint) as steadily losing the ability to accurately count the number of conversions attributable to advertising. In a theoretical future world where cookies and device identifiers are all gone, and fingerprinting is impossible, having a "large footprint" will be _useless_ from the perspective of counting conversions which occur off-network on other apps and websites. In such a world, if the only option available for counting conversions is a highly private one, like IPA, then I believe businesses who sell ads will use it (they won't have a choice). In that world, they'll have two options:
(i) Do not set a match key. Use a match key set by some other entity
(ii) Set a match key - accepting that anyone else who wants to can also use it.

Each entity will have to weigh these alternatives. For a business with a "large footprint" of users who sign in across multiple devices, here is how I think these choices will look:
(i) **Do not set a match key:** If other match-keys are from businesses with a smaller network of users logged-in across devices, taking this approach will have the un-desireable side-effect of *undercounting* the true number of conversions their ads actually drive. In summary: Less accurate measurement.
(ii) **Set a match key:** This will result in more accurate ads measurement - with higher counts of attributed conversions, which more accurately measures the number of conversions their ads drive. As a side-effect however, all competitors will **also** benefit from more accurate measurement of their ads. In summary: More accurate, but more accurate for everyone.

I posit that there exist businesses for whom the calculus is in favor of option (ii), more accurate measurement being more beneficial than everyone having less accurate measurement.

4. On the topic of "does this require users to be logged into Facebook?". In the proposal, we talk about the prospect of supporting **multiple match keys**. We think we can support this without needing to give up any privacy benefits. If that is true, then it would seem optimal for any consumer of this API to select a basket of match-keys which collectively provide good coverage. This has the additional benefit of minimizing the reliance on a single point of failure. I can envision a future where it is common to specify a handful of "large footprint" match key providers to get a good baseline, a few region specific ones to cover parts of the globe which would otherwise be poorly covered, potentially one's own match-key, and finally falling back on the random, per-device specified match key which essentially just provides "same-device only" attribution.

I think all parties (including "large footprint" entities) would all have similar incentives to push them in this fashion.

We've also put a lot of time and thought into trying to ensure there isn't coupling between entities. We think we can design the system in such a way that we do not require collaboration. That is, we want a system where any advertiser who runs ads across N platforms can **independently** specify which match-keys they want to use, without needing those platforms to all agree with them, or all need agree on something.

5.
> As long as the cryptographic stuff works and the ad networks are somehow coerced into dropping their other tracking methods, this is a big step up. But on the other hand if the crypto stuff has a hidden weakness in it and Facebook run one of the "trusted" servers, this is a terrible idea

First of all, I assume that Facebook / Google / any ad-tech company will never be trusted to operate a helper server =). This will be enforced by browsers. They'll have to decide which public keys they are willing to use to encrypt reports. I cannot imagine a world in which Firefox would trust Facebook enough to encrypt these events using Facebook's public key =). I'm assuming we will see non-profits with strong privacy reputations operating the servers, or possibly the types of organizations which operate Apple's "Private Relay" service.

Secondly: Yes, exactly. This proposed system would be a **big** step up for privacy compared to the status quo mechanisms used to count conversions. I have no expectation that browsers and mobile operating systems will stop trying to clamp down on fingerprinting. Actually, if anything I expect them to accelerate those efforts. I also expect to see more and more regulation along these lines.

That the math works out, and we have a strong privacy guarantee is the key. This is why we are trying to work out in the open - we think that's the best way to find all the problems / issues, and to get help finding solutions to them. We've already benefitted tremendously from outside input. @betuldurak found a really clever attack that a malicious helper node could do. I'm really grateful to her for telling us about it! We're working on finding a solution as we speak.

I think the path towards standardization looks like a bunch of iterations out in the open, publishing papers, getting feedback, addressing problems, repeat. I hope that we can eventually converge on a design that is super solid. I wouldn't expect browser vendors to feel comfortable shipping an API like this unless a bunch of independent academics were all convinced that it met our design goals.

--
GitHub Notification of comment by benjaminsavage
Please view or discuss this issue at https://github.com/patcg/proposals/issues/2#issuecomment-1039909389 using your GitHub account

--
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Tuesday, 15 February 2022 06:35:54 UTC