Re: [whatwg/url] file: URL with a port number through the host setter (#97)

@annevk File URLs & ports/authority has a storied history in Chrome... attached are my notes from the last time I dug into this, to at least hopefully explain the behaviour to figure out where to align:

### Can "file" have a host?
RFC 3986 Section 3.2.2 notes (in passing) that 
>  For example, the "file" URI
   scheme is defined so that no authority, an empty host, and
   "localhost" all mean the end-user's machine, whereas the "http"
   scheme considers a missing authority or empty host invalid.

If we dig back to RFC 1630, Page 18:
> There is clearly a danger of confusion that a link made to a local
   file should be followed by someone on a different system, with
   unexpected and possibly harmful results.  Therefore, the convention
   is that even a "file" URL is provided with a host part.  This allows
   a client on another system to know that it cannot access the file
   system, or perhaps to use some other local mecahnism to access the
   file.

and

> A void host field is equivalent to "localhost".

### Can "file" have a port?
RFC 1738, Section 3.10, which updates RFC 1630 (and became the basis for RFC 3986) notes the file scheme as:
>    A file URL takes the form:
       file://<host>/<path>

Unlike other schemes (such as prospero or wais), which explicitly list the `:<port>` construction in their ABNF, file:// lacks this.

So to what Chrome's behaviour is:
- When canonicalizing a `file://` URL, our canonicalizer constructs it in the form of `file://<host>/<path>?<query>#<ref>` ( https://cs.chromium.org/chromium/src/url/url_canon_fileurl.cc?rcl=0&l=88 ), meaning ports (and the colon) are always omitted when reserializing a file URL (and if host is empty, an empty authority component, resulting in the expected `file:///` triple-slash)
- When parsing a `file://` URL, setting aside the 'windows' special logic (and the UNC path logic), our parser always ignores the port ( https://cs.chromium.org/chromium/src/url/url_parse_file.cc?rcl=1483532378&l=45 )
- `file:` schemed URLs always result in a PORT_UNSPECIFIED for the effective port, meaning it should not end up serialized

To your question about what's the right behaviour: I suspect failing on `:` would probably be ideal, but I wouldn't be in a place to change anytime soon, simply because I don't have the time to own any fallout/regressions that it might cause (however unlikely). It might be one of my colleagues can own this, if it's believed to be important for compat. Not allowing port is definitely a good thing (... and seems like it'd require no work on Chrome's side, since that's what we do).

Allowing ports on file URLs seems to have the largest back-compat issues, at least re: spec precedent - it's seemingly long been forbidden - and it's also something probably unlikely for Chrome, if only because that would require a lot more monkey-ing about with the UNC & drive-letter sniffing logic.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/97#issuecomment-270461963

Received on Wednesday, 4 January 2017 19:24:47 UTC