- From: James <notifications@github.com>
- Date: Tue, 21 Jan 2025 17:10:05 -0800
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/577/2606065839@github.com>
@robin-aws > You're correct that it's also possible to encode the VSOCK information in the URI as http://1234567:123/.... But you don't have a clear indication that the authority uses the VSOCK address family, and it could easily be misinterpreted as an INET authority with 1234567 as a registered name instead. No, it could *not* be misinterpreted as an INET authority. And, "1234567" is *not* a valid top-level domain name. From RFC 3986: ``` The host subcomponent of authority is identified by an IP literal encapsulated within square brackets, an IPv4 address in dotted- decimal form, or a registered name. The host subcomponent is case- insensitive. The presence of a host subcomponent within a URI does not imply that the scheme requires access to the given host on the Internet. In many cases, the host syntax is used only for the sake of reusing the existing registration process created and deployed for DNS, thus obtaining a globally unique name without the cost of deploying another registry. However, such use comes with its own costs: ... ... The syntax rule for host is ambiguous because it does not completely distinguish between an IPv4address and a reg-name. In order to disambiguate the syntax, we apply the "first-match-wins" algorithm: If host matches the rule for IPv4address, then it should be considered an IPv4 address literal and not a reg-name. ... A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is distinguished by enclosing the IP literal within square brackets ("[" and "]"). This is the only place where square bracket characters are allowed in the URI syntax. In anticipation of future, as-yet-undefined IP literal address formats, an implementation may use an optional version flag to indicate such a format explicitly rather than rely on heuristic determination. ... ``` `man 3 inet`: ``` inet_aton() converts the Internet host address cp from the IPv4 numbers-and-dots notation into binary form (in network byte order) and stores it in the structure that inp points to. inet_aton() returns nonzero if the address is valid, zero if not. The address supplied in cp can have one of the following forms: a.b.c.d Each of the four numeric parts specifies a byte of the address; the bytes are assigned in left-to-right order to produce the binary address. ... ``` `man 7 ipv6` ``` The address notation for IPv6 is a group of 8 4-digit hexadecimal numbers, separated with a ':'. ``` This reference to a "first-match-wins" algorithm is just another tacit assumption of RFC 3986 which presumes the crafting of an appropriate heuristic. The application author is on-their-own for crafting that algorithm. We could pretty much apply this same reasoning in the RFC - just presuming some heuristic provided by an application to recognize the URI authority "host" - to the interpretation of the URI authority "port", to distinguish a "*DIGIT" from a socketpath, as in ' path ":" '. There is nothing "novel" in that. It would simply be polite to articulate such a presumption in the RFC itself, if it were to be adopted as a de facto standard in applications. -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/issues/577#issuecomment-2606065839 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/url/issues/577/2606065839@github.com>
Received on Wednesday, 22 January 2025 01:10:09 UTC