Re: [whatwg/url] Define hosts' public suffix and registrable domain. (#391)

@sleevi @weppos I'll toss in some _hopefully helpful_ but really wordy info here as color.

For the last decade we have struggled with the focused definition of 'public suffix' term, 'eTLD', 'registerable domain' and other terms as being interchangeable.  

I agree that we need a glossary or something to help make these definitions more clear, and perhaps identify and expire the use of some of them if we can.  But it gets tricky and nuanced.

Different users, developers, integrators, and contributors define these in a variety of ways, sometimes as synonyms, sometimes not.  This seems to result from the variation in how the PSL gets implemented within libraries or used in development.  Sometimes there is a granular distinction that drove a given term's usage.

**Pardon if I go too deep on this one but it helps us out in the process of coming up with a good path forward.**
In the past, @gerv the wize and powerful helped us out to tolerate these organic differences by reminding us of the origins of the PSL being about cookie horizons, and how it has grown since.

To avoid being drawn into the frothing energy around competing/alternative root TLD systems, the PSL maintainers opted to follow a document from ICANN called ICP-3, which defines a single authoritative root system for TLDs.  The IANA maintains the listings of the TLDs listed in that root. The IANA does not go deeper levels than these initial entries (so it would include .UK but not CO.UK and include .AU but not COM.AU).

A long time ago (_in a galaxy far, far away ;)_), it was discovered that one might be able to issue a 'super cookie' for CO.UK and slurp up all kinds of interesting data for the subdomains of CO.UK, and the PSL was born to dig deeper into these TLD structures in order to know what to treat as though it was 'effectively' (hence 'eTLD') a Top Level Domain when really it was a second level (or deeper) domain such as CO.UK.

And thus was born use of a static list to identify these nuances in a more elegant manner than the IANA list, and go deeper into the effective namespaces.

The benefit of such a list is that it is possible to cache it or incorporate it within one's software to understand how to treat entries, but a drawback is that having this update creates challenges because it is held in a centralized location (which defeats the benefits of the distributed nature of DNS that replaced the hosts.txt situation in the 80s).

In the years since, the PSL has really become the only widely used, community maintained, frequently updated list of strings that might be expected to behave as-if they are a TLD, even if they are not at the top level.

This evolved further.  While CO.UK is operated by Nominet who oversee .UK, and COM.AU is overseen by AUDA who oversee .AU, there are some TLD-like systems that leap that direct and authoritative connection.  Over the course of time, systems like Centralnic started offering subdomain registrations, Github offered subdomain hosting, and Dyn (now Oracle) started to offer DNS host naming, etc.  US.COM, operated by Centralnic, is technically under .COM, but not operated directly by the .COM registry.  So there is a change in the administrative horizon that begins at the root, and we opted to split the PSL into two sections, putting the IANA top down / ICANN delegated zones into the 'ICANN' section, and located this _stuff_ (mostly, it still needs constant audit) into a section that designated that horizon, the 'PRIVATE' section.

These lists seem at first like they are something that should be simple to compile, but the other maintainers and I would argue this is not the case.  As a result, developers and integrators and security experts, software libraries, certificate authorities, and browsers and search engines (and I could riff for a while on this) have leveraged the PSL as a core list (sometimes authority) on handling this stuff.

We as maintainers know what we do with it, but know we are not all knowing and have a spectrum of use-cases that get impacted by changes we might make to the file due to the processing that is done on it after it is downloaded.

Maintaining entries is non-disruptive to the list and the derivative users.  Renaming sections might be.  Defining these terms in a glossary may be helpful for future integrations, but not as much for where there is 'set and forget' code or processes.

I hope this is helpful - and not too "ivory tower" - as background.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/391#issuecomment-392101729

Received on Friday, 25 May 2018 15:53:03 UTC