RE: [navigation-error-logging] remove "navigation" requirement? from Todd Reifsteck on 2015-02-13 (public-web-perf@w3.org from February 2015)

From: Todd Reifsteck <toddreif@microsoft.com>
Date: Fri, 13 Feb 2015 19:10:25 +0000
To: Ilya Grigorik <igrigorik@google.com>, public-web-perf <public-web-perf@w3.org>
CC: "Aaron Heady (BING AVAILABILITY)" <aheady@microsoft.com>
Message-ID: <BN1PR0301MB061272BD82296DEF246457B4C2230@BN1PR0301MB0612.namprd03.prod.outlook.>
Looks good on a quick breeze. Go ahead and merge.

From: Ilya Grigorik [mailto:igrigorik@google.com]
Sent: Friday, February 13, 2015 10:56 AM
To: public-web-perf
Cc: Aaron Heady (BING AVAILABILITY)
Subject: Re: [navigation-error-logging] remove "navigation" requirement?

First run at removing the "navigation" restriction:

- Preview: https://rawgit.com/w3c/navigation-error-logging/drop-navigation/index.html

- Diff: https://github.com/w3c/navigation-error-logging/compare/drop-navigation


I've also added a "use cases" section based on the feedback from our call earlier this week.

Any objections to me merging this into main branch?

ig


On Wed, Feb 11, 2015 at 3:13 PM, Aaron Heady (BING AVAILABILITY) <aheady@microsoft.com<mailto:aheady@microsoft.com>> wrote:
Agree with everything you said. Taking a look at this:

That said, I think we'd need to carefully think through all the various scenarios here due to the cross-origin bits.. e.g. who opts-in, is there an opt-out, restrictions on who gets those reports, what data is shared, and so on.

Thanks,

Aaron


From: Ilya Grigorik [mailto:igrigorik@google.com<mailto:igrigorik@google.com>]
Sent: Wednesday, February 11, 2015 11:41 AM
To: Aaron Heady (BING AVAILABILITY)
Cc: public-web-perf
Subject: Re: [navigation-error-logging] remove "navigation" requirement?

On Wed, Feb 11, 2015 at 9:08 AM, Aaron Heady (BING AVAILABILITY) <aheady@microsoft.com<mailto:aheady@microsoft.com>> wrote:
Glad you clarified this in the context on of the new policy model. In the previous model, I always expected the thing.js error would have been put into the ‘error array’ for widget.com<http://widget.com> and pulled by them at some point in the future if they were interested.

Rehash of our previous discussion, but... That's insufficient for many cases. For example, I'm a popular image/meme site whose content is being embedded across the web: the embedded resources are images (not scripts, hence can't "self instrument"), and very few people (relatively speaking) ever come to my site directly, instead they only see the embeds. As a result, my ability to collect these reports is very limited. Worse, and very likely, the embedded resources are also served from a different host which is not a destination that visitors would actually go to (e.g. static.cdn.mysite.com<http://static.cdn.mysite.com>), and as a result I can't collect any error reports ever. In short, we need policy-based registration + UA delivery :-)

This clarifies that by saying widget.com<http://widget.com> registered their interest by setting the policy in the browser, so send widget.com<http://widget.com> the error info.

Right, and to make it concrete, if the embedded resource is served over HTTPs:
- Embedded resource specifies NEL policy in its headers alongside its regular response
- UA registers policy after successfully loading the embedded resource for the first time (regardless of where its embedded)
- If UA fails to load said resource in the future, it automatically beacons the error report to specified report-uri

It does open up one possibility though. Could the NEL policy for widget.com<http://widget.com> indicate that it’s okay to share the thing.js error with the caller, example.com<http://example.com>? Sort of a CORS attribute inside the policy that would allow partners to easily share error information with each other. That would standardize access the error information and prevent the custom subresource work required to “instrument subsequent fetches and listen to onerror callbacks, etc.”

That would expand your description:
Assuming "navigation" restriction is removed, the workflow for above example would be:
(a) widget.com<http://widget.com> register an NEL policy WITH CORS FOR example.com<http://example.com>
(b) user visits example.com<http://example.com> with widget.com<http://widget.com> resource that fails to load
(c) user agent triggers an NEL report [2] to widget.com<http://widget.com> indicating an error
(d) user agent triggers an NEL report [2] to example.com<http://example.com> indicating an error

Yep, this is a really interesting use case. In particular, it seems like it could go a long way towards helping site operators get a handle on reliability of third party embeds - e.g. a widget provider is blocked or malfunctioning and its affecting the performance of my site (or worst case, SPOF). That said, I think we'd need to carefully think through all the various scenarios here due to the cross-origin bits.. e.g. who opts-in, is there an opt-out, restrictions on who gets those reports, what data is shared, and so on.

As a starting point, I think the default policy is: if the UA failed to fetch a resource, and this resource belongs to a "known NEL host", then the user agent should beacon the error report to the collectors specified by the NEL policy of that host. This applies regardless of whether the resource fetch is a navigation or a subresource request, and in the latter case regardless of the origin from where it is being loaded.

ig

From: Ilya Grigorik [mailto:igrigorik@google.com<mailto:igrigorik@google.com>]
Sent: Tuesday, February 10, 2015 4:23 PM
To: public-web-perf
Subject: [navigation-error-logging] remove "navigation" requirement?

tl;dr: I propose we remove "navigation" from Navigation Error Logging.

The original and the new drafts of NEL have been scoped to "navigation requests" with the premise that a failure during the navigation sequence is not observable and can't be logged by the application. By contrast (our current premise) the application *can* observe subresource failures - e.g. once the page loads the application can instrument subsequent fetches and listen to onerror callbacks, etc. Hence, we scoped NEL to navigations only.

However, as I'm iterating on the spec, its becoming more and more clear that the above premise is not true and is, in fact, very limiting:

a) The application can observe failures (e.g. onerror callbacks) of subresource fetches, but it cannot get the same fidelity of information about the failure - e.g. detailed DNS/TCP/TLS errors, etc.
b) A resource may belong to a "known NEL host" [1] but is embedded on a third-party origin: if said resource fails to load there is no way for the NEL host to know (or instrument, even), that a failure occurred.

The (b) case is particularly painful. Consider the following example: widget.com<http://widget.com> provides a popular thing.js script that is embedded by many sites across the web... User visits example.com<http://example.com> that embeds the widget.com/thing.js<http://widget.com/thing.js> resource on its page but the resource fetch fails due to a TLS error. Today, thing.js fetch falls outside of scope of "navigation request": no report is generated, widget.com<http://widget.com> remains in the dark about this issue... both example.com<http://example.com> and widget.com<http://widget.com> are sad.

Given the prevalence of third-party resources (and the fact that they are often a SPOF for many sites) the inability to address the above embedding use case with NEL is a huge gap.

That said, the good news is, I think it's also easy to fix. We don't need "resource error logging", I think we just need to drop the "navigation" requirement from the current NEL draft. Everything else would remains the same and we wouldn't need to define yet another alternative mechanism to address (a) and (b) limitations described above.

Assuming "navigation" restriction is removed, the workflow for above example would be:
(a) widget.com<http://widget.com> register an NEL policy
(b) user visits example.com<http://example.com> with widget.com<http://widget.com> resource that fails to load
(c) user agent triggers an NEL report [2] to widget.com<http://widget.com> indicating an error

Thoughts, objections?

ig

[1] https://w3c.github.io/navigation-error-logging/#policy-storage-and-maintenance

[2] https://w3c.github.io/navigation-error-logging/#sample-navigation-error-report
Received on Friday, 13 February 2015 21:27:54 UTC