Re: [navigation-error-logging] new draft proposal from Ilya Grigorik on 2015-01-13 (public-web-perf@w3.org from January 2015)

From: Ilya Grigorik <igrigorik@google.com>
Date: Mon, 12 Jan 2015 16:40:39 -0800
To: "Aaron Heady (BING AVAILABILITY)" <aheady@microsoft.com>
Cc: public-web-perf <public-web-perf@w3.org>, Domenic Denicola <domenic@google.com>
Message-ID: <CADXXVKrGVH0m3kEfBRsMZMtgJs_GwYbV_Wo8Gy3+KWMnGPxeHw@mail.gmail.com>
Hi Aaron, thanks for the feedback! Inline..

On Mon, Jan 12, 2015 at 2:22 PM, Aaron Heady (BING AVAILABILITY) <
aheady@microsoft.com> wrote:

>  I’ll preface this with: I’m not a fan of the Delivery Policy idea, I
> prefer that the NEL details are in an array like the performance timing
> entries are and just accessed via client side script.
>
>
>
> That said, removing the .js API is really bad. I don’t want .js based
> registration of the Deliver Policy, I just want .js based access to the
> NavigationErrorLog object array via .js so I can read it client side at
> will. How is this different than accessing performance timing info via .js?
> We can also have policy via headers, but that leads to these questions.
>

We *have to* provide non-JS delivery to facilitate real-time + reliable
reporting:
(a) pure JS solution cannot deliver real-time reports since, by definition,
the navigation must have succeeded.. it only enables after-the-fact
reporting.
(b) after-the-fact reporting requires that the user comes back later and
that load succeeds: this can happen with an arbitrary delay (users
decision), or not at all - e.g. I click on a search result or link in some
article, it fails to load, I never come back and report is never delivered.

As a result, I think JS API is at best of very limited value. In practice
you'd want the UA to deliver the reports in the background on as-it-happens
basis.

Further, adding a JS API exposes new complications:
(a) it's not clear how long the UA should retain these navigation error
logs for? This could add a lot of overhead if user is experiencing poor
connectivity and is hitting a lot of errors.
(b) we're back to the same problem of shared buffer and races between
various scripts: either you have to diff the nav error logs and avoid
clearing the buffers (overhead), or you clear but run the risk of other
subscribers missing items (granted, this is not a new issue...)
(c) any script (including third party) can iterate over your navigation
error logs.. which exposes additional private data about the user+their
network without (in my opinion) adding much value due to all of the reasons
above.

... that said, I'm open to being convinced otherwise. ATM I just don't see
any real-world deployments actually using the JS API for all of the reasons
above, plus the additional privacy complications for the user (note that
CSP does not expose access to error reports for same reasons -- consistency
with other APIs is another argument).

The server delivers the NEL policy
> <https://cdn.rawgit.com/w3c/navigation-error-logging/new/index.html#dfn-nel-policy>
> to the user agent via an HTTP response header field (NEL header field
> <https://cdn.rawgit.com/w3c/navigation-error-logging/new/index.html#dfn-nel-header-field>).
> The policy *MUST* be delivered over a secure transport. If the policy is
> delivered over a secure transport with no underlying secure transport
> errors or warnings, and its format conforms to the specified grammar, the
> user agent *MUST* either:
>
>
>
> For above, Policy Delivery and Processing: By requiring secure delivery
> you have the added burden of setting up a secure channel just to deliver
> the policy on a normal HTTP page. This will have to execute as a background
> resource on every response we serve because we won’t know if the client has
> the delivery policy directive already, unless we add a cookie that tracks
> the expiration date, etc… Should there be request header indicating NEL
> enrollment so that everyone doesn’t have to roll their own tracking
> mechanism. Maybe NEL-max-age: 360 from the client would say I’m enrolled
> and have 360 second left until it expires. (bad header name, but you get
> the idea)
>

For, NEL request header: the client would have to send it on every request
to a known NEL host, which is no different (modulo a few header bytes) from
the server always appending the NEL policy response header. Adding the
"time until expires" is also another form of a cookie, which is something
I'd like to avoid. I'm not convinced we need this.


> But probably a bigger issue: This also skips the fact that some CDNs have
> clients (domains) setup up on a HTTP only network where DNS resolves to a
> server that can’t/won’t host SSL, SSL/TLS is on a different IP block. Thus
> if you can’t get a certificate tied to your domain, you can’t issue policy
> for that domain. If I’m on an HTTP only CDN network for example.com, how
> do I get a policy issued to that domain via a secure connection?
>

I think NEL qualifies as a "powerful feature" [1], hence HTTPS registration
is required - e.g. we don't want a MITM/dowgrade attack to be able to
hijack your error reports, hence HTTPS-only registration. That said, note
that once registered the policy would apply to both HTTPS and HTTP schemes
for that origin -- e.g. an HTTP site can make a background HTTPS request
(XHR, iframe, etc) to register the policy; you don't have to be HTTPS-only
to take advantage of NEL, you only need HTTPS to manage the registration.

[1]
https://w3c.github.io/webappsec/specs/powerfulfeatures/#is-feature-powerful


>
>
> *2.1.1.3 The **includeSubDomains** Directive*
>
> The OPTIONAL *includeSubDomains* directive is a valueless directive that,
> if present, signals the user agent that the NEL policy
> <https://cdn.rawgit.com/w3c/navigation-error-logging/new/index.html#dfn-nel-policy>
> applies to this NEL host
> <https://cdn.rawgit.com/w3c/navigation-error-logging/new/index.html#dfn-nel-host>
> as well as any subdomains of the host's domain name.
>
> For *includeSubDomains**, w*hat’s the consequence if issued on
> foo.example.com? It should then work for *.foo.example.com, but if
> subsequently example.com issues its own includeSubDomains, does that
> overwrite foo.example.com, thus *.foo also?
>

I believe (2.2) should address this: "The user agent must maintain the NEL
policy of any given NEL host separately from any NEL policies issued by any
other NEL hosts whose domain names are superdomains or subdomains of the
given NEL host's domain name. Only the given NEL host can update or cause
deletion of its NEL policy"... same logic as HSTS.


> Each report URI
> <https://cdn.rawgit.com/w3c/navigation-error-logging/new/index.html#dfn-report-uri>
> in the provided set of report URIs
> <https://cdn.rawgit.com/w3c/navigation-error-logging/new/index.html#dfn-set-of-report-uris>
> *MUST* use a secure transport to receive the NEL reports. If any of the
> provided report URI's
> <https://cdn.rawgit.com/w3c/navigation-error-logging/new/index.html#dfn-report-uri>
> does not use a secure transport, the user agent *MUST* ignore the
> provided policy. The process of sending navigation error reports to the
> specified URI's in this directive's value is defined in this documents 2.3
> Reporting
> <https://cdn.rawgit.com/w3c/navigation-error-logging/new/index.html#reporting>
> section.
>
>
>
> If the original user navigation, with all of the potential personal
> payload, doesn’t have to be secure, why does NEL telemetry have to be
> secure? Mind you, I like TLS and want to secure things. Just wondering why
> it is being dictated in this scenario. It also seems like it’s going to
> drive prices up on telemetry monitoring endpoints, 3rd party or in house.
>

See: https://w3c.github.io/webappsec/specs/powerfulfeatures/

>  The *REQUIRED* report-uri directive specifies a URI to which the user
> agent sends reports about navigation errors. The ABNF grammar for the name
> and value of the directive is:
>
> The *REQUIRED* max-age directive specifies the number of seconds, after
> the reception of the NEL header field, during which the user agent regards
> the host (from whom the
>
> Since both report-uri and max-age are required, what if we are just
> disabling the policy by  setting max-age to 0? Will not having a report-uri
> header cause the request to be invalid along the lines of the “*MUST*
> ignore ….that does not conform” comments earlier in the doc. Should uri
> only be required if max-age is present and >0?
>

Right, good catch. This was an editorial shortcut for me, I think we should
allow the simple "NEL: max-age=0" as a valid unregister policy.

ig


>
>
>
>
> *From:* Ilya Grigorik [mailto:igrigorik@google.com]
> *Sent:* Monday, January 12, 2015 12:52 PM
> *To:* public-web-perf
> *Cc:* Domenic Denicola
> *Subject:* [navigation-error-logging] new draft proposal
>
>
>
> We identified a number of issues with the current NEL draft at TPAC:
>
>
>
> 1) JS-based registration can be easily hijacked
>
> 2) Ability to aggregate multiple errors into a single report
>
> 3) Desire for more extensive error coverage and better delivery model
>
> ... more: https://github.com/w3c/navigation-error-logging/issues
>
>
>
> In attempt to address all of the above, I have a new draft proposal which
> is based on our experience with Domain Reliability [1], and also reuses a
> lot of the concepts from CSP and HSTS:
>
>
>
> https://cdn.rawgit.com/w3c/navigation-error-logging/new/index.html
>
>
>
> - HSTS~like header based registration
>
> - CSP~like error reporting for failed navigations
>
> -- JS interface is removed entirely for security and privacy reasons, same
> as CSP
>
> - Domain Reliability~like error types and report structure and delivery
>
>
>
> In short, it *is* a significant departure from the current draft, but I do
> believe that it addresses all the major open issues and provides a
> consistent interface to similar APIs (e.g. CSP).
>
>
>
> Would love to hear any thoughts or feedback!
>
>
>
> ig
>
>
>
> [1]
> https://docs.google.com/a/chromium.org/document/d/14U0YA4dlzNYciq2ke0StEMjomdBUN6ocSt1kN03HJ0s/edit?pli=1#
> <https://docs.google.com/a/chromium.org/document/d/14U0YA4dlzNYciq2ke0StEMjomdBUN6ocSt1kN03HJ0s/edit?pli=1>
>
>
>
>
>
>
>
Received on Tuesday, 13 January 2015 00:41:51 UTC