Re: WebPerfWG call - October 12th 8am PT/11am ET/5pm CET from Nic Jansma on 2023-11-06 (public-web-perf@w3.org from November 2023)

From: Nic Jansma <nic@nicj.net>
Date: Mon, 6 Nov 2023 09:09:58 -0500
To: Yoav Weiss <yoavweiss@google.com>
Cc: public-web-perf <public-web-perf@w3.org>, "Jansma, Nic" <njansma@akamai.com>, Carine Bournez <carine@w3.org>
Message-ID: <CAALun4sqHz8BUVQGwO7KnrF9BN2GwMykOZdwqRstg9Be_joeVg@mail.gmail.com>

(Apologies for the delay in posting these)
Minutes are now available:
Linked to from our WebPerf WG Agenda document
<https://docs.google.com/document/d/10dz_7QM5XCNsGeI63R864lF9gFqlqQD37B4q8Q46LMM/edit#>
Published to the web-performance Github meetings page
<https://w3c.github.io/web-performance/meetings/>
... and copied below:
Participants

Alex Christensten, Nic Jansma, Dan Shappir, Sean Feng, Jeffrey Yasskin, Ian
Clelland, Rafael Lebre, Michal Mocny, Carine Bournez, Sia Karamalegos, Mike
Jackson, Patrick Meenan, Giacomo Zecchini,

Admin

- Next meeting: October 26 (later timeslot: 10am PST, 1pm EST, 5pm GMT)
- WebPerf WG Charter under AC review
<https://www.google.com/url?q=https://www.w3.org/2002/09/wbs/33280/webperf-charter-2023/&sa=D&source=editors&ust=1699283013692264&usg=AOvVaw2GKv9EFgPHg0KkR9FVx914>

- Have your rep vote!

AgendaPrivacy principles and ancillary data - Yoav

Recording
<https://www.google.com/url?q=https://youtu.be/6qOynNDocl8&sa=D&source=editors&ust=1699283013692800&usg=AOvVaw0HDsxNXRvJTHeurFmBokG0>

- Current text:
https://w3ctag.github.io/privacy-principles/#ancillary-uses
<https://www.google.com/url?q=https://w3ctag.github.io/privacy-principles/%23ancillary-uses&sa=D&source=editors&ust=1699283013693039&usg=AOvVaw2uy7PwBUiLNyGnXI6HrMDm>
- jyasskin's proposal:
https://pr-preview.s3.amazonaws.com/jyasskin/privacy-principles/pull/361.html#ancillary-uses
<https://www.google.com/url?q=https://pr-preview.s3.amazonaws.com/jyasskin/privacy-principles/pull/361.html%23ancillary-uses&sa=D&source=editors&ust=1699283013693299&usg=AOvVaw0r74d6ky-dfe_yYf5bYZEZ>

- Please send asynchronous comments to
https://github.com/w3ctag/privacy-principles/pull/361
<https://www.google.com/url?q=https://github.com/w3ctag/privacy-principles/pull/361&sa=D&source=editors&ust=1699283013693536&usg=AOvVaw3hSlHfBOiSM2jVuKlyyDYI>,
or to issues on https://github.com/w3ctag/privacy-principles/issues
<https://www.google.com/url?q=https://github.com/w3ctag/privacy-principles/issues&sa=D&source=editors&ust=1699283013693696&usg=AOvVaw2rpLugXdWCeJsVEC1IGlq1>
once
that PR is closed.

- Yoav: Briefly talk about an effort happening between TAG and PING
- ... Privacy principles being worked on in TAG repo
- ... Also conversations around data minimizations and principles around
that
- ... Sites, user-agents, everyone should minimize personal data exposed
to web
- ... Personal data can mean anything, characteristics, how they
interact with the page, network, etc considered personal data
- ... Doc also defines ancillary uses of data (non-functional use)
defines that data is ancillary as well
- ... In current version of the document, if we have a single piece of
data being used for functional reasons (e.g. click event exposes timestamp,
functional reason for that), but can also be used for ancillary uses (i.e.
when it happened), that data becomes ancillary as well depending on how
it's being used
- ... I pointed out that issue to Jeffrey, as a result there's a second
PR on that front that is trying to rewrite that section to address that
discrepancy
- ... This PR proposes instead of defining ancillary data as part of its
usage, we define two types of ancillary APIs
- ... 1. Exposes non-ancillary data through other means
- ... 2. Exposes data not exposed through non-ancillary APIs
- ... Examples of that is DNS timing in RT, presentation times in ET,
memory measurement (new kinds of data that is only exposed for measurement,
monitoring and regression prevention purposes)
- ... Fact that some data is ancillary doesn't mean it has an outsized
privacy risk or is particularly sensitive, but it should be looked at per
data minimization principles in that doc
- ... Regarding that PR, no consensus in task force that it's OK that
non-ancillary data is OK to report for ancillary uses
- ... "Reducing collection cost would increase data collection" and is
going against what the principal is trying to prevent
- ... Assuming PR lands in some form, we have distinction between
ancillary data is novel data that's not available for other means
- ... Two potential mitigations
- ... 1. User permission to access that data
- ... 2. Private aggregation of that data
- ... For user permission front, in my opinion that may make sense for
info that is sensitive or requires extra debugging info for the machine
type (PII of the user), but it can be cumbersome and deter folks from using
the right APIs (and they might try to get access to this info from other
means)
- ... For private aggregation, we talked about this at TPAC last year.
Two potential shapes, in rough sketch that we have of these APIs from
privacy arena
- ... An API that is key-value based, key and measurement, which gets
uploaded to an aggregation server
- ... Browser defines metrics based on some predefined keys (metric type
and origin or URL hash) and values (is measurement itself, i.e. DNS times),
an browser-internal reporting that defines the key as a DNS metric+host,
the value is the time it takes for the browser to measure this DNS.
- ... Sent out to aggregations server, and RUM providers or Origins
could ask aggregation server for this data in aggregate
- ... Use in aggregated histogram form
- ... Other is some form of a worklet, where site can define code that
has access to various kinds of ancillary data, and that code cannot talk to
the page, but can output histogram distributions that would get sent to an
aggregation server
- ... Gets site more control over what gets reported, output is a
histogram
- ... Both would make it hard to present resource-specific data, unless
we can do histogram per-resource per-metric
- ... For sure this will be a significant shift from what RUM providers
are currently doing
- ... If we move to this kind of model for ancillary data, we could have
access to cross-origin data we don't see due to existing security
restrictions that we don't necessarily have in the aggregate
- ... Bring this discussion to the group's attention and gather thoughts
on that
- ... I'm trying to represent what we think when talking to privacy task
forces
- Jeffrey: Step back and share why we're looking t this at all
- ... Sense from privacy folks on the task force that this group
produces a bunch of APIs that could be turned off, and users could continue
doing what they do. Might sacrifice website's long term health
- ... Task force members want users to be able to better control what
ancillary data is used
- ... User be able to turn that off and not have that data contribute to
the site
- ... We can't turn off DOM APIs that let sites get this, APIs that
summarize that data are a lost cause
- ... What should we do about APIs that expose new information?
- ... Question that the task force asks that I don't have an answer to,
how does this group think those APIs should be constrained?
- ... Different set of constraints that this group wants to write into
privacy principles
- ... You have some constraints you're operating under, maybe not a
principle yet but you have a one-off for each API
- ... Set of principles for this group that you write under would be good
- ... APIs that expose new information should not expose new data,
aggregate or ask permission
- Dan: Question, the discussion seems a bit theoretical to me
- ... Potential for harm, but I'm not aware of concrete specific
examples of harm
- ... Discussion about cookies and privacy related to cookies
- ... Specific examples of harm are well-known and documented
- ... Do we have specific examples of harm being done using existing
APIs?
- Alex: There is a lot of requests for accessing new data that we have a
great use for to improve a website, and I believe you have use-cases, and
am sympathetic
- ... But have to consider how data could be abused also, most of it is
fingerprinting
- ... Websites want to know exactly who a user is using this website
- ... Is there high-background memory usage or CPU usage, then it's more
likely it's the same user that saw this website with same characteristics
- ... Concrete example that's giving fuzzy but useful fingerprinting data
- ... User has not indicated that they want to be fingerprinted
- ... Happy to hear we're talking about aggregate anonymous data
collection, we'd have much less objection to
- ... Want website to have data but just not have knowledge of users
from whom that data came (without their consent)
- Yoav: Main risk is fingerprint-ability, every new piece of data adds a
few bits of entropy that can be used to target the specific user
- Jeffrey: We can write a principle without saying we have to use it
right away. Sets a long-term goal for the group. We haven't designed the
aggregation APIs we need. We can still ship APIs without having to think if
they contribute to fingerprinting bits one by one.
- Yoav: To build on Dan's question and Alex's answer, I think that it's
good to look at this from the risk and mitigation perspective.
- ... Helps my case at looking at novel data vs. already-exposed data
- ... Already-exposed data is already available + active fingerprinting
data
- ... Slightly more coarse doesn't, in most cases, enable any new kinds
of attacks if we're looking at fingerprinting as the risk
- Dan: Obviously any bit of information contributes to fingerprinting,
but that's just potentially. Concrete examples?
- ... Another question, aggregate collection of data into an aggregation
server. My understanding is that it's opt-in, when do you envision the
user opting in?
- Yoav: Two things, user-permission and aggregated reporting. In my head
those are mutually exclusive
- Dan: When using aggregation, they're automatically opted-in?
- Yoav: Yes
- Dan: Critical. If it's opt-in, can we assume no data would be
captured?
- ... If it's per-site, in addition to cookies/etc, they'd have to
opt-in to performance reporting
- ... Less than ideal
- Jeffrey: Want principle to say if it's aggregated (de-personalized),
it can be used by default
- ... Users should be able to opt-out
- ... Most aggregate systems have a threshold of identifiability, maybe
the browser would be leaking something, so users could opt-out
- ... Second question for Alex, are you comfortable with an API that
exposes information already shared from a DOM, is that OK?
- Alex: Agreed, but the standard of what's exposed from the DOM is
different by different browsers. E.g. what's exposed in CHrome isn't
necessarily done so by Safari
- Jeffrey: Question of how to phrase new information is hot topic on
task force
- ... Wording suggestions there are welcome
- Alex: We already also ship on-by-default anonymous data gathering
features that users can opt-out of, but on by default for 99%+ of users
- Sia: Step back, there's going to be harms on both sides of this, so
there are tradeoffs
- ... Harm of fingerprinting, but also harm on lower-end usage
- ... Framing of that discussions, issue of equity and other issues as
well
- Benjamin: Point out that the charter explicitly has out-of-scope
performance analysis
- ... For e.g. compute pressure, I think these are out of scope, do we
need to revisit this in the charter
- Yoav: This isn't about performance analysis, but is about gathering
data
- ... Out of scope is how to analyze those bits afterwards
- Benjamin: Talking about data collection that is used for data analysis
- Yoav: sendBeacon(), fetchLater(), are about data collection that is
used for analysis. Bits in charter are around how does one process that
data after it's collected, not about how browser sends it out
- ... If we defined aggregate reporting in the WG, and there are issues
with charter around that, we can revisit at that point
- Jeffrey: To Sia's point, it's been hard to get a privacy document to
talk about tradeoffs with non-privacy goals. But it does say it doesn't
trump other web principles.
- ... May need to trade off privacy principles with other goals
- ... API may not strictly adhere to privacy principles
- Yoav: We've been talking about maybe creating monitoring and
deployment principles, or some other document that talks about the broader
good that the APIs this working group working on do, and enable something
to anchor other principles that one can tradeoff with privacy and others
- Katie: As someone who cares deeply about privacy but works at a
company that cares about performance because of the measurable impact we
can track (e.g. conversion rate, bounce rate).
- ... Frightening to me that we'd lose fidelity about ways users stayed
on the site and converted
- ... Reality on the ground is we sell perf to companies because it will
improve their bottom-line
- ... Without being able to get something approximating that user data,
it would be hard to continue to make the case for web performance in a
corporate setting
- Sia: Are you saying it'd be hard to justify for corporations, but
maybe other other organizations could?
- ... Buying things is a part of life
- Katie: There are a lot of moral ambiguity here
- ... As someone who navigates this question at work of why do we invest
in performance, it drives the bottom line
- ... I wish it wasn't that way, but it is
- Jeffrey: I think we can get performance APIs that can get that
connection
- ... Privacy Sandbox folks are tracking seeing an ad to buying something
- ... I think we can get APIs that are similarly private
- Yoav: I don't think we're talking about scrapping existing APis and
moving to aggregated data
- ... New and aggregate data that isn't available elsewhere
- ... We still can do the tracking for that same-origin traffic and
report that data for everything web-exposed
- ... While decorating that with histograms of e.g. DNS times
- Michal: Wanted to follow-up with Benjamin's example of Compute Pressure
- ... Been consulting a few clients referencing this API
- ... Web platform feature that's not ancillary data
- ... Goal is for users that have compute-intensive features, ways to
identify ways of backing off fidelity of experience
- ... Non-ancillary data API
- ... Demand for a feature like that, might be alternatives, whatever
- ... Might be a future proposal to summarize that data for ancillary
purposes
- ... Does a site tend to be under compute pressure in the aggregate
- ... Maybe a large-scale change that the site would want to make
- ... In one earlier revision of privacy principles, users should have
to pay for that ancillary use
- ... I think we're past that
- ... Ancillary data should only be through aggregate, privacy-preserving
- Yoav: MSFT folks talked about being able to use something like Compute
Pressure as a dimension to be able to split data on, in order to
distinguish NT vs. ones on idle machines
- Nic: as a RUM provider I want to help our customers. Always wanted to
report on the entire page weight where aggregated reporting can help us get
there.
- … but it would be hard to incorporate it. Want to be part of this
conversation

User timing and framework use counters - Annie

Recording
<https://www.google.com/url?q=https://youtu.be/ZG0VBBmgGOs&sa=D&source=editors&ust=1699283013702366&usg=AOvVaw3p8JKr380_f1MzJOHmmTTf>

- Annie: No proposed API changes here, just convention on how we use
UserTiming
- ... UserTiming L1 spec, there was standards (we had pulled since),
mark_fully_loaded, mark_fully_visible
- ... For sites to report their own custom load times, if they don't
want to use e.g. LCP
- ... New use-case for a convention
- ... Frameworks / CMS features
- ... Image directives, a feature to add fetch priority hints
- ... Font modules, fallback fonts
- ... 3P modules allowed to load 3Ps without putting scripts into the
critical path
- ... Using them can improve performance
- ... We'd like to better understand if they're working well
- ... UserTiming mark might be a good way to know if the feature was
being used
- ... Marks are already in traces, lab tools could say e.g. using a
feature would improve
- ... RUM providers could note usage, and show A/B test results
- ... HTTP Archive could show usage
-
- ... Proposed syntax
- ... Wondering if others' would find this convention useful
- Sia: Are you thinking about more here?
- ... Feature is NgOptimizedImage and the version is XYZ?
- Annie: Yeah if we use detail, it's type any, so you could add more
information
- Sia: I think it's interesting
- ... Rick just added to HTTP Archive capturing the shopify data
- ... This could be cool
- Yoav: I think there are two different things here
- ... HTTP Archive can expose this detail, and then people could do
complex queries on it
- ... Chrome and use-counters could also expose a predefined set
- ... If you would want a NgOptimizedImage5, could be exposed in HTTP
Archive but not use counter
- ... People can contribute a one-line to library where after approved,
it becomes a use-counter you'd get stats on usage in the wild
- Annie: For bigger orgs you can control what's in there?
- Katie: +1 to the use-case of organizations being able to define their
own feature usage
- ... If we're running synthetic test via cron on SpeedCurve, how to tie
back to a feature experiment is running is proving difficult
- ... Automatically goes into a format that WPT could use
- ... Could open up a few doors for being able to tie performance back
to an internal feature
- Annie: Being able to split experiments
- Katie: Being able to track when this new feature is appearing
- ... How to keep track of when everything is happening and is combined
- ... Trying to figure out better ways of tracking that without it being
burdensome computationally
- ... Can see a lot of usage for this for us
- Nic: In L1 we had these fancy standard names, it’s not in L3?
- Annie: I should add it back
- Nic: And then it’d be one of those. For the features, would your team
come up with a list of suggested features? Would there be a standard for
feature names?
- Annie: Names could collide
- Nic: You mentioned existing features - angular specific
- Annie: and you could add more things that are org specific
- Pat: Would help to add guidance to consider when you log the mark
- .. e.g. use-counters mark the time they were first used on the page
- … may make sense for some features
- Yoav: I think we can benefit from standard names, if we want to
reflect this in Chrome use-counters, and we want WPT or Lighthouse to say
smart things about framework features, I think we should have a
standardized or wiki'd list.
- ... Would have to be well-known in Chromium code
- ... e.g. there could be collisions
- ... Add some namespace, and a way of serializing namespace and feature
so it avoids collisions
- Annie: Will add a Github issue and link here

- Nichttp://nicj.net/
@NicJ

On Wed, Oct 11, 2023 at 4:12 PM Yoav Weiss <yoavweiss@google.com> wrote:

> Hey folks!
>
> On tomorrow's call <https://meet.google.com/agz-fbji-spp> we'll talk
> <https://docs.google.com/document/d/10dz_7QM5XCNsGeI63R864lF9gFqlqQD37B4q8Q46LMM/edit?pli=1#heading=h.w6d4g336zmxa>
> about privacy principles & ancillary data, framework use counters and have
> a short navigation ID update.
>
> See y'all there! :)
>
> Cheers,
> Yoav
>

Received on Monday, 6 November 2023 14:10:18 UTC