RE: drop "Resource Error Logging" in favor of failed fetches in Resource Timing?

I’m inclined to stick with keeping Resource Error Logging separate from Resource Timing. If for no other reason than one is focused primarily on measuring the success case (RT) and the others are focused on the failure case (REL and ultimately Nav Error Logging). They will evolve as standards with different demands and use cases.

Ilya,
Very nice write up in the Domain Reliability Monitoring doc.

From: Ilya Grigorik [mailto:igrigorik@google.com]
Sent: Friday, October 17, 2014 9:14 AM
To: Ben Maurer
Cc: public-web-perf
Subject: Re: drop "Resource Error Logging" in favor of failed fetches in Resource Timing?

On Fri, Oct 17, 2014 at 12:40 AM, Ben Maurer <ben.maurer@gmail.com<mailto:ben.maurer@gmail.com>> wrote:
One potential reason to go with the REL approach is that there is always the possibility that the resource containing the actual logging code is the one that failed to load (ie, a failure in your CDN that served javascript).

Note that REL does *not* provide UA-reporting infrastructure - that only applies to Nav Error Logging. The assumption is that to fetch a subresource you must have successfully loaded the parent page first, and if that's the case, then you can ship some error code within it to listen for subresource fetches and log their status appropriately - e.g. via Beacon.

In addition, I think you want more data than simply timing data -- for example, if the failure happened in the SSL connection establishment it could be useful if REL could tell you if it was because of an invalid cert, if the connection timed out, etc.

That's a great point. A the moment our error types are { "dns", "tcp", "ssl", "http", "abandoned" } [1], but it seems reasonable to extend that to provide more granular reporting. Case in point, Chrome's Domain Reliability Reporting does exactly that, see "status strings" in this doc: https://docs.google.com/a/chromium.org/document/d/14U0YA4dlzNYciq2ke0StEMjomdBUN6ocSt1kN03HJ0s/edit?pli=1#heading=h.r0q86f3dy4cq


I'm not saying we need to match the above list, but it may make sense to consider some of the above (now or in the future), and those conditions are not something you can get or infer from ResourceTiming timestamps.

ig

[1] https://dvcs.w3.org/hg/webperf/raw-file/tip/specs/NavigationErrorLogging/Overview.html#NavigationErrorType




On Fri, Oct 17, 2014 at 12:06 AM, Ilya Grigorik <igrigorik@google.com<mailto:igrigorik@google.com>> wrote:
Navigation Error Logging [1] captures error data for document navigations + provides out-of-band reporting functionality when the page load fails. Resource Error Logging [2] provides error data for subresource fetches: same ResourceErrorLogging entry, but no automated reporting, just the JS API (you can report yourself).

During yesterday's conference call Arvind raised the option of dropping Resource Error Logging spec in favor of simply exposing failed fetches as part of Resource Timing: you can setup an onload/onerror callback to detect failed subresource fetches, and RT would give you enough timing data to detect where the fetch has failed.

1) The RT route is more involved for the developer, as it requires setting up callbacks, etc.
2) RT route can't distinguish "abandoned" case - e.g. image is loading but user hits stop. That's a very different error from a fetch failing midway due to other connection errors.
3) RT route is much simpler from spec surface perspective...

At the moment my personal preference would be to keep Resource Error Logging: simpler and more intuitive for the developer; I think (2) is a strong enough reason on its own. That said, curious what others think?

ig

[1] http://w3c.github.io/web-performance/specs/NavigationErrorLogging/Overview.html

[2] http://w3c.github.io/web-performance/specs/ResourceErrorLogging/Overview.html

Received on Friday, 17 October 2014 18:31:38 UTC