Re: WebPerfWG call - June 22nd 8am PT/11am ET/5pm CET

Belatedly publishing the minutes
<https://w3c.github.io/web-performance/meetings/2023/2023-06-22/index.html>
and recording <https://youtu.be/yyVMVzbGqd0> for this.. Apologies!

Copying the minutes here for convenience:
WebPerfWG call - June 22nd 2023
Participants

Nic Jansma, Giaxomo Zecchini, Dan Shappir. Ian Clelland, Hao Liu, Michal
Mocny, Neil Craig, Amiya Gupta, Patricija Cerkaite, Leon Brocard, Andy
Luhrs, Sia Karamalegos, Carine Bournez, Patrick Meenan, Sean Feng
Admin

   - Next call - July 20th 1pm PT (!!)


   - (skipping July ~4th week)

MinutestimeOrigin
<https://www.google.com/url?q=https://docs.google.com/presentation/d/1RW_fLZwUevn7IBXE3hFvKMnaEBs83Dxp3_yLflEfAXk/edit?usp%3Dsharing&sa=D&source=editors&ust=1690297359675687&usg=AOvVaw1OxhdK_3dEogkaxLJdbme->
-
Nic

Recording
<https://www.google.com/url?q=https://youtu.be/yyVMVzbGqd0&sa=D&source=editors&ust=1690297359676021&usg=AOvVaw3b2nrPhfT-3npZxhyMa27c>


   - *Nic*: dug into the timeOrigin attribute in the last few weeks
   - … use a lot of the performance APIs and need to convert timestamps
   into wall time
   - … When reviewing the JS code for boomerang.js, found it uses
   timeOrigin as well as performance.navigationStart interchangeably
   - … would try to move to timeOrigin going forward, but wanted to talk to
   the group about it
   - … The first NavTiming included performance.timing.navigationStart -
   large number based on epoch distance (in milliseconds???)
   - … Deprecated but still shipped everywhere
   - … DOMHighResTimestamp from performance.now() and other timestamps
   gives you a timestamps in milliseconds from the navigation start
   - … So we can do math with them
   -
   - … Then we added timeOrigin, potentially with higher resolution to
   navigationStart
   - …
   - … Basically using timeOrigin where available and navigationStart when
   it isn’t
   - … Moved to timeOrigin and got a lot of issues with their test suite
   - … In some cases, timeOrigin and navigationStart show different
   distance from epoch
   - … In Firefox/Safari they could be +- 1 ms (maybe due to rounding)
   - … In Chrome there’s a difference between them because timeOrigin is
   more precise
   - … Seems like these numbers should be the same
   - … But can they be different?
   - … We’ve said in the past that they should match, but looking through
   the spec they may be different when the browser is launched from a blank
   slate
   - … Unclear if the spec allows them to be different
   - … Also, seen browser bugs around all of these
   - … Drift in timeOrigin in Chromium, resulting in it different than what
   you’d expect
   - … Firefox has a different issue where the timeOrigin was shared
   between different tabs in the process
   - … Safari has an issue where timeOrigin changes over time, where it
   drifts over time. Many days after the page was open, it created quite a gap
   from its original value
   - … If there’s an intentional difference between timeOrigin and
   navigationStart, how can the navigationEntry’s startTime is 0? Some logical
   gap there
   - …
   - … RUM can’t use timeOrigin at all ATM, which is not ideal
   - … Should timeOrigin and navigationStart be the same?
   - *Yoav*: In my opinion they should be the same
   - … Any discrepancies we’re seeing is that no one cares a lot about the
   deprecated API
   - … On the implementation side, there’s no energy looking at the
   deprecated API
   - … Talking to Noam, it seems like all issues you’re outlining are
   symptoms that we have two different clocks that are inherently misaligned
   and we’re trying to align
   - … Wall-clock that is NTP corrected, or a user could change, that is
   one clock
   - … Tied to some form of external reality
   - … Then we have a monotonic clock that is counting the seconds that the
   browser sees, render process sees
   - … Depending on the implementation it does different things
   - … Sync’d once render process starts (or when window is created?),
   synced to system clock and gets timeOrigin value
   - … Could get out of sync
   - … Because hibernation, etc
   - … The Chromium bug we changed syncing from every time a window is
   created to once when a renderer process is created.  That created huge
   drifts, because render processes can be impacted by sync on timeOrigin
   - … Any change in syncing frequency will impact that drift
   - … One symptoms is that navigationStart and timeOrigin were further
   apart
   - … Essentially navigationStart timestamps are constantly syncing or
   point-in-time than timeOrigin
   - … If this is indeed a pain-point, that seems fixable.
   - … Question is why does it matter
   - *Nic*: In our RUM script we prefer to always use the same monotonic
   clock. If the system clock changes, that doesn’t matter to us
   - … At other times, we’re sending wall clock timestamps just for
   convenience
   - … Every time we do that, we have a mixture of timestamps from both
   clocks, and they drift
   - … Given 2 timestamps that are different - why are they different and
   which one should I use?
   - … If the answer is “timeOrigin”, we need all the browsers to fix their
   bugs
   - … e.g. in Safari timeOrigin drifts
   - *Yoav*: If we were to fix navigationStart to have the same sync points
   as timeOrigin, and timeOrigin is not sync’ing mid-way (like in Safari), we
   had well-defined sync points in the spec, everywhere?
   - … If those two things are correct, could you ignore wallclock
   - … Use timeOrigin as your anchor to reality?
   - *Nic*: We never compare to Date.now(), so it’s more about having a
   consistent stable reality. Even if there’s drift that’s fine. Just need the
   timestamp to be stable
   - … Some of the reasons we’re seeing this is code written a long time
   ago that’s using a mix of those different APIs, resulting in us seeing the
   drift in actual measurement
   - … Chrome’s navStart and timeOrigin are the same. Other browsers see
   larger drift
   - … Open issue for Firefox, none for Safari
   - *Sean*: can check the Firefox bug
   - *Nic*: Was able to repro in an older version. May have been fixed
   - … May not be an issue for other folks
   - Nic to follow-up with Safari on bug:
   https://bugs.webkit.org/show_bug.cgi?id=258572
   <https://www.google.com/url?q=https://bugs.webkit.org/show_bug.cgi?id%3D258572&sa=D&source=editors&ust=1690297359680853&usg=AOvVaw3eWme2I7tBz9tKwSFHdIpw>


NEL suborigin policies should not be able to generate reports on success
<https://www.google.com/url?q=https://github.com/w3c/network-error-logging/issues/141&sa=D&source=editors&ust=1690297359681099&usg=AOvVaw1vHfts1jlF7KsCIclYUFg2>

   - *Ian*: Had 2 different mitigations in NEL, but it turned out that
   subdomains were able to send success reports when they shouldn’t have.
   - … An attacker who could hijack DNS reports, and also inject a NEL
   policy, could set a policy with a success fraction. Then, when the DNS
   issue is fixed:


   - 1. All reports *including success reports* are "downgraded" to DNS
   errors, because of the IP change, and
   - 2. Subdomain policies are only allowed to trigger DNS error reports, so
   - 3. The success report is delivered, masked as a DNS error report.


   - … Fixed in the spec, still an implementation issue

Remove support for HTTP Trailers
<https://www.google.com/url?q=https://github.com/w3c/network-error-logging/issues/146&sa=D&source=editors&ust=1690297359681851&usg=AOvVaw2Tz7UJV3g-7LH78p0N0Tm_>

   - *Ian*: This came up as a result of a bad reference, because trailers
   were removed from the Fetch spec.
   - … NEL’s processing algorithms currently get the headers from headers
   and trailers
   - … This was never actually implemented anywhere
   - … Also, because trailers aren’t supported elsewhere, that’s fine
   - … option 1 - remove trailers from NEL spec and then implement request
   and response headers
   - … option 2 - remove request headers and response headers
   - … option 3 - implement with full trailer support
   - *Neil*: On response headers, I wasn’t aware that it wasn’t implemented
   - … thought of using it to denote which HTTP edge service was active for
   a particular network report
   - … serve based on geography, but that changes over time
   - … Would love to not have to correlate NEL reports with time to figure
   out which edge servers were used for a particular request
   - *Ian*: Definitely makes it easier to justify the work
   - … bumped into that feature accidentally, would be a good use case for
   us
   - … Not sure which response header needs to be specified to indicate an
   edge server
   - *Yoav*: Trailer support was removed from Fetch, but Server-Timing has
   specified trailer support (implemented in Firefox).
   - … Worthwhile to see what the situation is there
   - … Spec has trailer support with some implementation backing (in
   Firefox)
   - *Ian*: Not sure if Lucas is on the call, but had mentioned Noam had
   removed that section last year
   - *Yoav*: Maybe we fixed the spec problem but made Firefox non-compliant
   in the process? Not sure
   - … I know Fastly were keen on trailer support for Server-Timing
   - … If implemented and used in the wild we should have it be part of the
   spec
   - … Though for Fetch a single implementation is not enough, and it would
   have to be monkey patched
   - *Ian*: Mentioned in HTTP spec
   - *Patrick*: On headers for in-house vs. CDN is server IP reliable?
   Available for a lot more failure conditions than headers would be
   available.  Only work for 4xx HTTP code, where edge was reachable but
   origin wasn’t.  Server IP should be there for anything but DNS.
   - *Neil*: Ideal world would have both.  Marginally be easier if both
   were available.
   - … One of the things that cropped up is we see a lot of abandoned
   events, if we run our own CDN vs. commercial CDN, and want to know more
   about what’s causing that
   - *Patrick*: Useful in general for finding which edge node, e.g. Anycast
   situation, but kinds of errors NEL is available to report skew heavily
   towards not being able to reach the server or edge.
   - *Neil*: Our NEL data is the other way, abandoned is a large chuck.
    ~90% abandoned and unknown.  They’re tricky to take action on them.
   - *Patrick*: For abandoned you don’t get headers, maybe?  Started to
   respond and didn’t finish responding?  More likely didn’t even get to
   respond.
   - … Could be valuable to have both when debugging since they’re both
   hard to figure out
   - *Ian*: Sounds like some support for adding response headers to NEL
   - *Neil*: For us specifically, request headers isn’t useful, response
   headers are
   - *Ian*: As far as trailers I can’t tell if they’re useful, only in
   Firefox (which doesn’t have NEL), I suspect it’s best to just pull that out
   of the spec text for now
   - *Yoav*: Unless there is actual implementation interest to implement
   it, it should just be removed from the spec
   - … If and when there’s interest, we can just add it back

Could there be a way to collect Crash/Intervention reports without
collecting Depreciation reports?
<https://www.google.com/url?q=https://github.com/w3c/reporting/issues/263&sa=D&source=editors&ust=1690297359685350&usg=AOvVaw0Hbm57IiaqYFDEaE-jp7gT>

   - *Ian*: Should all these features be configurable independently of each
   other? Seems like a good idea. Unclear where those configuration goes
   - … No current special headers for each one of them. So we could have a
   special reporting name for each of these?
   - … Or maybe something else for reporting configuration in general
   - *Neil*: Might be worthwhile to do something similar to what we’re
   talking about at NEL filtering, where you filter reports you don’t want to
   include
   - … The vast majority of reports we get are deprecations, which are
   being ignored
   - ... so could avoid receiving them
   - *Ian*: The report-to header defined the endpoint and other headers
   defined what goes into the reports
   - … putting the filter on the reporting headers makes sense. Not sure
   that the same is true on the reports
   - *Neil*: There isn’t a deprecation reporting header though
   - *Andy*: Looking at sampling recently, some overlap there. Zero
   sampling could handle opt-outs entirely
   - *Yoav*: I’d love to come back to ignoring deprecation reports.
   - … Deprecation reports are telling you that you’ll have breaking in X
   months, and ignoring means you will have breakage then
   - *Neil*: Org challenges where engineering teams are detached from our
   reports, but may have their own ways of tracking this
   - *Yoav*: No way to translate those reports you’re collecting into
   alerts on their end
   - *Neil*: Not automatically.  Mechanism is I raise an alert with our
   level1 response team, and they can raise with appropriate team, but that
   may be noisy.
   - … We foot the bill for the header deprecation reports as well.
   - … Would be nice to be able to control




On Wed, Jun 21, 2023 at 10:33 PM Yoav Weiss <yoavweiss@google.com> wrote:

> Hey folks,
>
> Join <https://meet.google.com/agz-fbji-spp?hs=122&authuser=0> us tomorrow
> to talk about WebPerf! On the agenda
> <https://docs.google.com/document/d/10dz_7QM5XCNsGeI63R864lF9gFqlqQD37B4q8Q46LMM/edit?pli=1#heading=h.3sdl3h8a6dkh>
> we have a discussion about timeOrigin, followed by NEL and Reporting issues.
>
> See y'all there! :)
>
> Cheers,
> Yoav
>

Received on Wednesday, 26 July 2023 09:44:25 UTC