Re: [whatwg/url] Addressing HTTP servers over Unix domain sockets (#577)

@robin-aws 
> You're correct that it's also possible to encode the VSOCK information in the URI as http://1234567:123/.... But you don't have a clear indication that the authority uses the VSOCK address family, and it could easily be misinterpreted as an INET authority with 1234567 as a registered name instead.

No, it could *not* be misinterpreted as an INET authority.  And, "1234567" is *not* a valid top-level domain name.

From RFC 3986:
```
   The host subcomponent of authority is identified by an IP literal
   encapsulated within square brackets, an IPv4 address in dotted-
   decimal form, or a registered name.  The host subcomponent is case-
   insensitive.  The presence of a host subcomponent within a URI does
   not imply that the scheme requires access to the given host on the
   Internet.  In many cases, the host syntax is used only for the sake
   of reusing the existing registration process created and deployed for
   DNS, thus obtaining a globally unique name without the cost of
   deploying another registry.  However, such use comes with its own
   costs: ...
...
   The syntax rule for host is ambiguous because it does not completely
   distinguish between an IPv4address and a reg-name.  In order to
   disambiguate the syntax, we apply the "first-match-wins" algorithm:
   If host matches the rule for IPv4address, then it should be
   considered an IPv4 address literal and not a reg-name. ...

   A host identified by an Internet Protocol literal address, version 6
   [RFC3513] or later, is distinguished by enclosing the IP literal
   within square brackets ("[" and "]").  This is the only place where
   square bracket characters are allowed in the URI syntax.  In
   anticipation of future, as-yet-undefined IP literal address formats,
   an implementation may use an optional version flag to indicate such a
   format explicitly rather than rely on heuristic determination.
...
```
`man 3 inet`:
```
inet_aton() converts the Internet host address cp from the IPv4 numbers-and-dots notation
into binary form (in network byte order) and stores it in the structure that inp points to.
inet_aton() returns nonzero if the address is valid, zero if not.  The address supplied in cp can
have one of the following forms:

       a.b.c.d   Each of the four numeric parts specifies a byte of the address; the bytes are assigned
       in left-to-right order to produce the binary address.
...
```
`man 7 ipv6`
```
The  address  notation  for IPv6 is a group of 8 4-digit hexadecimal numbers, separated with a ':'.
```
This reference to a "first-match-wins" algorithm is just another tacit assumption of RFC 3986 which presumes the crafting of an appropriate heuristic.  The application author is on-their-own for crafting that algorithm.

We could pretty much apply this same reasoning in the RFC - just presuming some heuristic provided by an application to recognize the URI authority "host" - to the interpretation of the URI authority "port", to distinguish a "*DIGIT" from a socketpath, as in ' path ":" '.  There is nothing "novel" in that.  It would simply be polite to articulate such a presumption in the RFC itself, if it were to be adopted as a de facto standard in applications.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/577#issuecomment-2606065839
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/577/2606065839@github.com>

Received on Wednesday, 22 January 2025 01:10:09 UTC