Re: [w3ctag/design-reviews] confidence reporting for PerformanceNavigationTiming (Issue #878) from Martin Thomson on 2025-05-13 (public-webapps-github@w3.org from May 2025)

From: Martin Thomson <notifications@github.com>
Date: Mon, 12 May 2025 22:34:53 -0700
To: w3ctag/design-reviews <design-reviews@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3ctag/design-reviews/issues/878/2875099303@github.com>

martinthomson left a comment (w3ctag/design-reviews#878)

Looking at this, I wonder if there isn't a far simpler solution:

* If the system is loaded (i.e., those cases where confidence would be "low" in the proposed design), the API might fail.
* To provide sites with some baseline failure rate so that failures are not directly correlated with heavily loaded clients, have some sort of non-trivial, but random rate of failure. 5% might do for that. The only purpose of that is to establish that failures are completely normal and there is no expectation that any given visit will result in a report. It might even be possible to go higher.

Not submitting data like that would also be more private than what is proposed. Because it doesn't reveal timing information when the browser is loaded. That is likely a situation where more reports are sent out as multiple windows are loaded at the same time, which means that correlation between low confidence reports might reveal good cross-site correlation information, even with randomized response.

The consequence of that would be that you would generally only get performance metrics from browsers that are not under stress. The risk is that you might incentivize manual data collection methods if you don't provide metrics. After all, most of the underlying factors are observable anyway, but my guess is that most sites wouldn't bother to collect noisy metrics like that. After all, I'd guess that most page loads are done on machines that aren't in this heavily loaded state.

This came from me looking at the "differential privacy" solution being proposed. In doing so, I noticed that the approach lacked the necessary sensitivity analysis. It didn't identify the privacy unit, but I'd assume that it was the report, where the real privacy risk comes from a privacy unit with a larger scope, like the user or user plus site. Covering each page load doesn't help if your noise can be averaged out by having multiple reports generated. (In other words, I don't think that claims about this providing differential privacy are good without more analysis.)

--
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/878#issuecomment-2875099303
You are receiving this because you are subscribed to this thread.

Message ID: <w3ctag/design-reviews/issues/878/2875099303@github.com>

Received on Tuesday, 13 May 2025 05:34:57 UTC