Re: [whatwg/url] Support IPv6 zone identifiers (#392) from Karl on 2023-01-19 (public-webapps-github@w3.org from January 2023)

From: Karl <notifications@github.com>
Date: Thu, 19 Jan 2023 07:00:53 -0800
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/392/1397110482@github.com>

IMO, we should support Zone IDs.

Fundamentally, no host has a universally-guaranteed meaning. The URL standard does not define what hosts actually mean, and generally the assumption is that they will be passed to a system _resolver_.

How that resolver works is undefined, and in general, different systems will do different things, and allow for the user to customise different parts of the process. For instance, the `hosts` file can be used to provide a custom mapping, and after that the system may search the local network or other sources before falling back to DNS (The Windows [`GetAddrInfoEx` function](https://learn.microsoft.com/en-us/windows/win32/api/ws2tcpip/nf-ws2tcpip-getaddrinfoexa), for example, claims to support not only DNS, but also NetBIOS, WINS, Bluetooth, and various peer-to-peer protocols). But generally, after consulting local sources, the resolver will query DNS.

DNS itself can be heavily customised - both by the user, and by the backend. Users can provide custom DNS servers (e.g. [Google public DNS](https://developers.google.com/speed/public-dns)), and ISPs can direct queries to particular servers using dedicated physical infrastructure, [on-site caches](https://arstechnica.com/information-technology/2022/10/redditor-acquires-decommissioned-netflix-cache-server-with-262tb-of-storage/), or to [alternate websites](https://en.wikipedia.org/wiki/DNS_hijacking) (let's imagine the state has a problem with website X and wants to send users to a more ideologically-appropriate site). Ultimately, we have no way to detect any of that. We have no idea what the hostname `example.com` actually means, and whether the result obtained by a specific client resolution process accurately reflects what the author of the URL intended. And in modern networks, where devices are mobile, generally suspend rather than shut down, and may be negotiating between various WiFi and cellular networks, network configurations can easily fluctuate within the lifetime of a single process, meaning the identity of a resolved name is constantly in flux.

IP addresses are similarly fuzzy. Two machines with different network configurations may have different understandings of what a given address should mean. We give an IP address to the system, and it connects to some machine, and that's about as much as we can say about it. It doesn't come with _nearly_ as much ambiguity as domains have, but it's all still client-specific.

So when I see arguments such as:

> Inclusion of purely local information in the *universal* identity of a resource
runs directly counter to the point of having a URI.

And

> the Web security model depends on having a clear definition
for the origin of resources. The definition of Origin depends on the
representation of the hostname and it relies heavily both on uniqueness
(something a zone ID potentially contributes toward) and consistency across
contexts (which a zone ID works directly against)

I think it overstates how much we can _actually_ rely on existing hostnames to be unique, and it fails to explain how `10.0.0.1` and `[::abcd]` constitute a "universal identity" which is "consistent across contexts" but `[::abcd%eth0]` somehow is neither.

But more to the point, I think it misrepresents what URLs are. URLs are _universal identifiers_, but that does NOT mean that they contain the _universal identity_ of a resource. It just means that they subsume all other kinds of identifiers. It is perfectly fine to use URLs to identify data in a local application - e.g. something like `my-recipe-app:/chicken-curry/ingredients#4` is not a misuse of URLs, even if it fails to resolve, or resolves to something else, on another machine.

URLs are, IMO, simply a flexible syntax for expressing the different kinds of identifiers that exist, so that any application can see the URL `http://[::abcd%eth0]/config/foo`, understand what the different parts are, and infer how to connect to that resource, using the system interfaces available to do so (accepting that they may be configurable).

And I think it should be possible to express these kinds of locations under the `http` scheme. They are popular enough that many shipped products use them, and operating systems have included [the required interfaces](https://pubs.opengroup.org/onlinepubs/009696699/functions/if_nametoindex.html) to resolve these names for over a decade. They seem to be an intrinsic part of IPv6 addresses, so IMO the only reasonable course is to accept them as part of our support for IPv6 addresses.

Of course, no client is obligated to support a particular kind of host. I don't see any technical for doing so, but browsers should be allowed to decline requests to such URLs if they wish. I hope they would at least make it a configurable option rather than an outright ban.

--
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/392#issuecomment-1397110482
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/392/1397110482@github.com>

Received on Thursday, 19 January 2023 15:01:05 UTC