Re: [whatwg/fetch] CORB: nosniff handling (#686) from Lukasz Anforowicz on 2018-03-28 (public-webapps-github@w3.org from March 2018)

From: Lukasz Anforowicz <notifications@github.com>
Date: Wed, 28 Mar 2018 21:10:23 +0000 (UTC)
To: whatwg/fetch <fetch@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/fetch/pull/686/c377037793@github.com>
> Are you planning on changing the implementation to use a network error for the cases enumerated in the PR?

Before sending out this PR I talked with @nick-chromium and we think it should be fine to change the implementation to inject a network error when blocking images/audio/video/fonts (but keep filtering the response for other cases).  But, we thought (and I still do) that this is unnecessary because the difference won't be observable.  Thanks @domenic for pointing out the link rel=preload as=image case - I'll try adding WPT tests for this.  I do note that Blink raises link.onerror event for _both_ load and decode errors (see [blink::Resource::ErrorOccurred](https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/loader/fetch/Resource.h?type=cs&q=blink::Resource::ErrorOccurred&sq=package:chromium&l=253) and [blink::LinkLoader::NotifyFinished](https://cs.chromium.org/search/?q=LinkLoader::NotifyFinished&sq=package:chromium&type=cs)), so there still seems to be no observable difference between net errors and decode (e.g. empty body) errors.  FWIW, I am still trying to play with link/preload/image case here: https://crrev.com/c/984338.

This PR deals with images (and audio/video/fonts/~~tracks~~), but if avoiding network errors might be important for other destinations, then we might anyhow be forced to _eventually_ cover this aspect of CORB behavior in the Fetch spec.  FWIW, @csreis pointed out that the initial CORB implementation intentionally avoided injecting network errors to avoid disrupting cases where a webpage issues a request that 1) the webpage expects to succeed, but where 2) the webpage doesn't care about the response.  I am guessing that the focus was on XHR / fetch cases in `no-cors` mode, but maybe @csreis can provide more specific examples.

Anyway - if we need to anyhow introduce a concept of a CORB-filtered-response in the long-term (e.g. when specifying how to handle/block 206 responses for XHR/fetch), then maybe we should use this concept consistently, rather than have the spec and the implementation inject network errors in some cases and filter response headers+body in other cases.  QUESTION: What are your thoughts on introducing the concept of a CORB-filtered-response into the Fetch spec?  If we went down that route, then would we need a separate nosniff-for-CORB section (separate from the existing [section here](https://fetch.spec.whatwg.org/#should-response-to-request-be-blocked-due-to-nosniff?) and with a separate reference/step/item in the [main fetch algorithm](https://fetch.spec.whatwg.org/#main-fetch))?  It seems kind of unfortunate that the list of headers on CORS-filtered-response is different from the CORB-filtered-response (the reason is benign - quality of some CORS-related error messages suffered when CORB filtered out CORS-related response headers).

I am not sure what is the best course of action here.  Let me try to play with WPT tests and maybe this will help me get more familiar with potential network-error VS filtered-response differences.  So far we don't know of a way to observe net-error-VS-empty-body difference for _images_, so maybe we should proceed with the current PR proposal.  WDYT?

> It's intentional that this only happens if the server defined a X-Content-Type-Options: nosniff header for the resource? Doing it more generally breaks too much?

I am not sure if I understood the question, but let me try to reply below.

Blocking just based on the `Content-Type` header would break too much _without_ _sniffing_ done by CORB to confirm that Content-Type matches the response body.  We estimate that without sniffing CORB would block if sniffing were omitted, [almost 20% of responses would be blocked](https://chromium.googlesource.com/chromium/src/+/master/services/network/cross_origin_read_blocking_explainer.md#quantifying-corb-impact-on-existing-websites).

Blocking based on the `Content-Type` header _and_ confirmation sniffing is needed for CORB to protect as much resources as possible.  But... blocking of images that sniff as HTML is not observable to the web, so I don't think this aspect of CORB needs be be covered by the web specs.

Also note that I understand that the changes in this PR don't cover all aspects of CORB (e.g. handling of 206 and/or blocking based on sniffing for JSON security prefix / parser breaker).  I thought that I can tackle spec changes in small steps and can leave 206+sniffing for later (nosniff is by far the biggest class of scenarios with observable CORB impact in the wild).  I am also hesitant to dive into specifying CORB sniffing - I was hoping to avoid having a formal spec for this (sniffing is not needed to define CORB behavior for 206 and nosniff;  sniffing can be seen as an implementation detail - if an implementation convinces itself that a JSON security prefix never appears in images or media then it can block without any observable impact).

> You include "track", but per the HTML Standard browsers are already required to be strict for its fetches, irrespective of nosniff. That's not the case in implementations?

@annevk, can you please clarify which specs require track fetches to be strict wrt the Content-Type response header?  I tried looking at https://html.spec.whatwg.org/multipage/media.html#sourcing-out-of-band-text-tracks and I notice that it says:

> This specification does not currently say whether or how to check the MIME types of text tracks, or whether or how to perform file type sniffing using the actual file data. Implementors differ in their intentions on this matter and it is therefore unclear what the right solution is. In the absence of any requirement here, the HTTP specification's strict requirement to follow the Content-Type header prevails ("Content-Type specifies the media type of the underlying data." ... "If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource.").

Is this the part of the spec that you meant?  (asking mainly so that I can refer to the correct spec in CORB explainer)

FWIW, I tried looking at wpt/html/semantics/embedded-content/media-elements/track/track-element/, but I couldn't identify tests that check the Content-Type here.  Therefore I am not sure how to easily verify Chromium's behavior :-/.  Also - not sure if it is worth opening a bug (somewhere?) to add WPT test coverage here.

> What https://cs.chromium.org/chromium/src/services/network/cross_origin_read_blocking.cc?q=%22json%2B%22&sq=package:chromium&dr=C&l=83 considers JSON is a superset of what this PR considers JSON. Is that intentional? (I think this also goes for XML.)

Thanks for reminding me about this issue.  I think the we should make sure that Chromium's CORB implementation and the spec definitions of [HTML](https://mimesniff.spec.whatwg.org/#html-mime-type) / [XML](https://mimesniff.spec.whatwg.org/#xml-mime-type) / [JSON](https://mimesniff.spec.whatwg.org/#json-mime-type) MIME types should agree.  I probably should put together a Chromium CL that 1) changes Chromium implementation, 2) adds WPT tests for CORB-ing or not-CORB-ing specific MIME types.  The Chromium CL would have to be reviewed by other security folks (e.g. xtofian@) who were advising us on what MIME types to cover (I _think_ they would be okay with the changes, but I'd have to double-check about `application/json+protobuf` - AFAIR this was what prompted https://crrev.com/c/850551/).  For now I've opened https://crbug.com/826756 to track this.

QUESTION: What are your thoughts on WPT coverage of MIME type definitions?  Do you think I can/should cover more MIME types by tests similar to wpt/fetch/corb/img-png-mislabeled-as-html-nosniff.tentative.sub.html?  I could switch from using hardcoded response headers for wpt/fetch/corb/resources/png-mislabeled-as-html-nosniff.png to having something parametrized via URL / php.  OTOH, this particular test seems a bit icky to me - CORB's effects are only indirectly visible and force test verification to be done via reference page/image comparison...

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/fetch/pull/686#issuecomment-377037793
Received on Wednesday, 28 March 2018 21:10:50 UTC