[whatwg/fetch] Handling of invalid Location header characters doesn't match browsers (#883)

We recently regressed some handling of invalid Location headers in Chrome. In digging into that, I noticed the Fetch spec doesn't really match browsers here. I should also note I'm not very familiar with the structure of these specifications, so it's certainly possible I'm reading some of them wrong.

Fetch appears to define handling of the Location header as (https://fetch.spec.whatwg.org/#http-fetch, steps 5.2 and 5.3):
> Let location be the result of [extracting header list values](https://fetch.spec.whatwg.org/#extract-header-list-values) given `Location` and actualResponse’s header list.
> If location is a [value](https://fetch.spec.whatwg.org/#concept-header-value), then set location to the result of [parsing](https://url.spec.whatwg.org/#concept-url-parser) location with actualResponse’s URL.

Note, in particular, that "extracting header list values" is defined in terms of the headers ABNF. I don't see a Location ABNF in Fetch, so I assume this is intended to be the one in [RFC 7231](https://tools.ietf.org/html/rfc7231#section-7.1.2). That cites various URI-related ABNFs in [RFC 3986](https://tools.ietf.org/html/rfc3986#appendix-A), which implies only a subset of ASCII as allowed, and instead other things should have already been percent-encoded. Inputs that don't match appear to be an error ("extract header values" returns an error when the ABNF does not match). Is that the intended reading?

This doesn't appear to match what is actually implemented by browsers or necessary for web compatibility. In particular, we had a report that users editing ru.wikipedia.org, until recently, would be served redirects where Cyrillic characters in the path were escaped, but those in the fragment were not.

**Some interesting cases to test:**
*Note: when testing Chrome, test with Chrome canary, specifically 75.0.3735.0 or later. 73 has a bug. 74 should have the fix in not too long but currently does not.*
`\xff` refers to a byte with value 0xff. `%ff` refers to three bytes with values `%`, `f`, and `f`.
* `Location: https://example.com/#\xe2\x98\x83` (some unescaped UTF-8 character)
* `Location: https://example.com/#\xe2\x98\x83%e2%98%83` (mix of escaped and unescaped)
* `Location: https://example.com/#%\xe2\x98\x83` (something after a %)
* `Location: https://example.com/#\xff` (UTF-8 encoding error)

Chrome (pre-73 and 75+), Firefox, and Safari all appear to resolve these to:
* `https://example.com/#%E2%98%83`

* `https://example.com/#%E2%98%83%e2%98%83`

* `https://example.com/#%%E2%98%83`

* `https://example.com/#%FF`


(I haven't tested Edge and IE.)

That last case is particularly interesting. Chrome internally percent-encodes all non-ASCII header bytes before passing the string on to URL handling logic. (This logic is very very old, so I don't know what originally motivated it.)

This seems a goofy way to specify it, but the obvious alternative doesn't seem to work. Suppose Fetch just said all Location headers were valid and you just pass the value to the URL parsing algorithm, I don't think that's what would come out. The URL [parsing](https://url.spec.whatwg.org/#concept-url-parser) algorithm takes a [string](https://infra.spec.whatwg.org/#string) input, not bytes, and seems to primarily percent-encode as [UTF-8](https://url.spec.whatwg.org/#utf-8-percent-encode). Interestingly, in the quoted text above, Fetch passes a [value](https://fetch.spec.whatwg.org/#concept-header-value) to that function, which is a byte sequence, not a string. And, indeed if I remove the funny preprocessing step in Chrome, `%FF` turns into `%EF%BF%BD` (U+FFFD).

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/fetch/issues/883

Received on Tuesday, 19 March 2019 21:22:58 UTC