Re: [navigation-error-logging] remove "navigation" requirement?

On Wed, Feb 11, 2015 at 9:08 AM, Aaron Heady (BING AVAILABILITY) <
aheady@microsoft.com> wrote:

>  Glad you clarified this in the context on of the new policy model. In
> the previous model, I always expected the thing.js error would have been
> put into the ‘error array’ for widget.com and pulled by them at some
> point in the future if they were interested.
>

Rehash of our previous discussion, but... That's insufficient for many
cases. For example, I'm a popular image/meme site whose content is being
embedded across the web: the embedded resources are images (not scripts,
hence can't "self instrument"), and very few people (relatively speaking)
ever come to my site directly, instead they only see the embeds. As a
result, my ability to collect these reports is very limited. Worse, and
very likely, the embedded resources are also served from a different host
which is not a destination that visitors would actually go to (e.g.
static.cdn.mysite.com), and as a result I can't collect any error reports
ever. In short, we need policy-based registration + UA delivery :-)


> This clarifies that by saying widget.com registered their interest by
> setting the policy in the browser, so send widget.com the error info.
>

Right, and to make it concrete, if the embedded resource is served over
HTTPs:
- Embedded resource specifies NEL policy in its headers alongside its
regular response
- UA registers policy after successfully loading the embedded resource for
the first time (regardless of where its embedded)
- If UA fails to load said resource in the future, it automatically beacons
the error report to specified report-uri


> It does open up one possibility though. Could the NEL policy for
> widget.com indicate that it’s okay to share the thing.js error with the
> caller, example.com? Sort of a CORS attribute inside the policy that
> would allow partners to easily share error information with each other.
> That would standardize access the error information and prevent the custom
> subresource work required to “instrument subsequent fetches and listen to
> onerror callbacks, etc.”
>
>
>
> That would expand your description:
>
> Assuming "navigation" restriction is removed, the workflow for above
> example would be:
>
> (a) widget.com register an NEL policy *WITH CORS FOR example.com
> <http://example.com>*
>
> (b) user visits example.com with widget.com resource that fails to load
>
> (c) user agent triggers an NEL report [2] to widget.com indicating an
> error
>
> *(d) user agent triggers an NEL report [2] to example.com
> <http://example.com> indicating an error *
>

Yep, this is a really interesting use case. In particular, it seems like it
could go a long way towards helping site operators get a handle on
reliability of third party embeds - e.g. a widget provider is blocked or
malfunctioning and its affecting the performance of my site (or worst case,
SPOF). That said, I think we'd need to carefully think through all the
various scenarios here due to the cross-origin bits.. e.g. who opts-in, is
there an opt-out, restrictions on who gets those reports, what data is
shared, and so on.

As a starting point, I think the default policy is: if the UA failed to
fetch a resource, and this resource belongs to a "known NEL host", then the
user agent should beacon the error report to the collectors specified by
the NEL policy of that host. This applies regardless of whether the
resource fetch is a navigation or a subresource request, and in the latter
case regardless of the origin from where it is being loaded.

ig


> *From:* Ilya Grigorik [mailto:igrigorik@google.com]
> *Sent:* Tuesday, February 10, 2015 4:23 PM
> *To:* public-web-perf
> *Subject:* [navigation-error-logging] remove "navigation" requirement?
>
>
>
> tl;dr: I propose we remove "navigation" from Navigation Error Logging.
>
>
>
> The original and the new drafts of NEL have been scoped to "navigation
> requests" with the premise that a failure during the navigation sequence is
> not observable and can't be logged by the application. By contrast (our
> current premise) the application *can* observe subresource failures - e.g.
> once the page loads the application can instrument subsequent fetches and
> listen to onerror callbacks, etc. Hence, we scoped NEL to navigations only.
>
>
>
> However, as I'm iterating on the spec, its becoming more and more clear
> that the above premise is not true and is, in fact, very limiting:
>
>
>
> a) The application can observe failures (e.g. onerror callbacks) of
> subresource fetches, but it cannot get the same fidelity of information
> about the failure - e.g. detailed DNS/TCP/TLS errors, etc.
>
> b) A resource may belong to a "known NEL host" [1] but is embedded on a
> third-party origin: if said resource fails to load there is no way for the
> NEL host to know (or instrument, even), that a failure occurred.
>
>
>
> The (b) case is particularly painful. Consider the following example:
> widget.com provides a popular thing.js script that is embedded by many
> sites across the web... User visits example.com that embeds the
> widget.com/thing.js resource on its page but the resource fetch fails due
> to a TLS error. Today, thing.js fetch falls outside of scope of "navigation
> request": no report is generated, widget.com remains in the dark about
> this issue... both example.com and widget.com are sad.
>
>
>
> Given the prevalence of third-party resources (and the fact that they are
> often a SPOF for many sites) the inability to address the above embedding
> use case with NEL is a huge gap.
>
>
>
> That said, the good news is, I think it's also easy to fix. We don't need
> "resource error logging", I think we just need to drop the "navigation"
> requirement from the current NEL draft. Everything else would remains the
> same and we wouldn't need to define yet another alternative mechanism to
> address (a) and (b) limitations described above.
>
>
>
> Assuming "navigation" restriction is removed, the workflow for above
> example would be:
>
> (a) widget.com register an NEL policy
>
> (b) user visits example.com with widget.com resource that fails to load
>
> (c) user agent triggers an NEL report [2] to widget.com indicating an
> error
>
>
>
> Thoughts, objections?
>
>
>
> ig
>
>
>
> [1]
> https://w3c.github.io/navigation-error-logging/#policy-storage-and-maintenance
>
> [2]
> https://w3c.github.io/navigation-error-logging/#sample-navigation-error-report
>

Received on Wednesday, 11 February 2015 19:41:47 UTC