Re: [whatwg/url] Addressing HTTP servers over Unix domain sockets (#577) from Klaus Frank on 2023-07-10 (public-webapps-github@w3.org from July 2023)

From: Klaus Frank <notifications@github.com>
Date: Mon, 10 Jul 2023 07:39:46 -0700
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/577/1629110900@github.com>
I still think we need to extend the PEG with a way to specify the lower-layer protocols (I.E., chain multiple schemas together). Especially since HTTP can now also be via UDP and more and more stuff uses HTTP as a transport/tunneling protocol...

Like so far, I think this one would encompass all of these (new) challenges and flexibility while still being backward compatible with the current one:

`LowestLayer+HigherLayer+EvenHigherLayer://[[username]:[password]@EvenHigherLayerEndpointIdentifier]:[HigherLayerEndpointIdentifier]:[LowerLayerEndPointIdentifier]/resource`
(with optional square brackets around each attribute and default values for the lower layers if not specified in the URL explicitly, as well as recommendation to offer a strict parsing mode for implementations that will not try to guess anything and only treat URLs with square brackets around every attribute and explicitly provided data (no implied application ports, no implied lower layer protocols, ...), mainly for security, futureproofing and reliability in usages by scripts and automation, as well as for debugability by experts and prosumers). And multiple (chained) endpoint identifiers only being allowed for the verbose version (to avoid parsing bugs and ambiguity), as well as requiring EndpointIdentifiers to match the number of specified lower layers 1:1 (but in reversed order).
(And the current `username:password@` would explicitly become part of the part that specifies the HTTP endpoint for example, so each layer can have it's own independent login information or additional protocol specific information, we'd just hand it off to the protocol the schema specified as an opaque blob)


Examples:

||Stack||example URI||Comment||
|---------|----------------------|--|
|TCP => HTTP|`tcp+http://[example.com]:[80]`|Http but explicit|
|UDP => HTTP|`udp+http://[example.com]:[80]`|HTTP via UDP but explicit (no probing and no fallback to e.g. TCP)|
|TCP => TLS => HTTP|`tcp+tls+http://[example.com]:[example.com]:[443]`|So having two schemas specified for "TLS" wrapped version is now no longer necessary as a side effect, but  kepting them for backward compatibility for already added/specified ones isn't an issue anyway|
|TCP => TLS => HTTP|`tcp+https://[example.com]:[443]`|Same, but with HTTPS instead of "tls+http"|
|UDP => HTTP => TCP|`udp+http+tcp://[48569]:[example.com]:[80]`|Specifies a raw TCP stream that is tunneled through HTTP which itself is served via a UDP connection|
|IP => TCP => TLS => HTTP|`ip+tcp+tls+http://[example.com]:[example2.com]:[443]:[2001:db8::1]/foo`|This form would mean that an IP connection to 2001:db8::1 is established that contains a TCP connection to port 443, which contains a TLS connection¹. And the HTTP being within it and the SNI header of `example.com`|
|HTTP => HTTP|`http+http://[example.com]:[username:password@example2.com]`|authenticating against example2.com to use it as HTTP proxy to connect to example.com, also avoids current ambiguity of credentials being for the destination or the proxy|
|Unix Socket => HTTP|`socket+http://[example.com]:[/run/foo/bar.sock]/foobar`|opening a unix socket to /run/foo/bar.sock and sending example.com as the SNI name|

¹: with an explicitly specified hostname `example2.com` to use for certificate validation. Web browsers should throw a disableable (in the options, not the error message itself) error if this differs from the HTTP SNI, but that's application behavior (shouldn't be part of the PEG), as for CLI tools, debugging and developing or for web proxies like thouse universities use for off-campus online access to journals etc, it is very much desirable.



This extension (or admittedly propose for a new version of the PEG) is my preferred improvement, as it does not break the independence of the different protocols and allows extensability, debugability and explicity.




@lcampbel, your examples would have compatibility issues in the real world, as some servers have (not quite RFC compliant) usage of double slashes in the URL. I already had the unpleasant opportunity to debug such an issue in an API. Requests just failed without the additional slash. Also, some people use ".localhost" for their localhost development environment. I've seen that with some k8s developers with a clone of the environment running locally and ".localhost" they used for the parts of the web app that would normally have been public (in the prod deployment). Everything below it represented the different subdomains of it (mainly because `*.localhost.` resolves to `127.0.0.1` and `::1` on almost all systems, regardless of how many subdomains one provides, and without the need for editing the hosts file or deploying a locally running additional DNS resolver with a special zone file)...

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/577#issuecomment-1629110900
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/577/1629110900@github.com>
Received on Monday, 10 July 2023 14:39:53 UTC