Re: [whatwg/url] Provide a succinct grammar for valid URL strings (#479)

While I unfortunately do not have the time to contribute to any work on this at the moment, I have a few thoughts.

1. First, I agree that care should be taken to avoid confusion about normativity. There definitely should be only one normative spec. If a grammar were to go into the spec itself alongside the algorithm, with the algorithm remaining normative, great care would need to be taken to ensure that they remain accurate as disagreement between the two breeds problems.
1. Second, I believe that you already basically have not one, but *two* alternate semi-normative specifications anyway: the section on writing URLs, which specifies a sort of a grammar on how to write them out, and the test suite. I don't believe that anyone can state with certainty that the section on writing URLs actually matches the parser, and I think [this comment](https://github.com/whatwg/url/issues/418#issuecomment-429746822) *by one of the major contributors to the spec* goes to show how the test suite is treated basically as normatively as the spec, if not more.
1. Third, I am convinced that trying to define a grammar, normative or non-normative, for the spec as it is, is fundamentally a fool's errand.
1. But I am not of the opinion that this means that it shouldn't be done. I believe that the current parser should be ripped out entirely, or at least moved to an auxiliary specification on how browsers should implement an actual specification.

To elaborate a bit, I very much disagree with the claim that "the spec is good as is". The spec definitely provides an unambiguous specification with enough information to determine whether or not an implementation meets the specification. This is enough to meet the bare minimum requirements and be an adequate technical standard. But it has a number of flaws that make it difficult to use in practice:
    - It conflates domains. This URL specification is primarily geared towards the web and web standards, as is indicated by a lot of the implicit assumptions it makes (see also #535). But the use of URLs, and RFC 3986, extends far beyond the web and the spec does not make any meaningful attempt to address uses outside the web. Recommendations on displaying URLs to users are explicitly applicable only to browsers. It defines an API applicable only to the web, with no discussion of API design for other environments. It canonically defines `file` as the default scheme when no scheme is specified, when most clients would likely prefer to make that decision themselves.
    - The mere fact that the spec is a living standard is not suitable for use in many application domains. It may be acceptable for the web, perhaps, but there are other interchange systems that need a more reliable mechanism.
    - It contains almost no background or discussion. It contains only a section listing the goals of the document and three sparse paragraphs on security considerations. It does not explain the purpose of a URL or the *human* meaning of its various components. It explains almost none of its decisions, such as why special schemes are special or why particular different API setters behave the way they do, or why special schemes get a special, elevated place in the spec to have their scheme-specific parsing requirements incorporated into it.
    - It is poorly organized. For instance, it discusses security considerations in sections 4.8 and 1.3 and does not mention this in section 2.
    - Most relevantly to the original topic here, it is nearly impossible for a human to reason about whether or not a URL is valid without manually executing the algorithm. It is incredibly opaque. There is no benefit to this. I defer to @sjamaan's [excellent comment](https://github.com/whatwg/url/issues/24#issuecomment-420407600). I find the suggestion that section 4.3 provides a useful "overview" of the grammar to be ridiculous. It doesn't. It's just as opaque as the rest of the document.
    - As an additional point, the opacity of the spec makes it nearly impossible to reason about whether a given behaviour is intentional or a bug. The spec is defined by the *implementation* in pseudocode. Even understanding the spec's behaviour given an input, much less deciding whether or not it is correct, effectively requires *debugging* the specification.
    - There is no abstraction of related concepts, and there is bad mixing of technical layers between semantics and syntax. Semantic errors are returned during parsing, rather than during a separate step on the parsed values. 

It is worth noting that this specification explicitly intends to obsolete RFC 3986. RFC 3986 is a confusing mix of normative and informative text, and a difficult specification to apply and use. Yet this specification is distant from being able to obsolete it because it is targeted entirely at one application domain

In conclusion, this spec is a [PHP Hammer](https://blog.codinghorror.com/the-php-singularity/). It is not "good". It is barely adequate in the one domain it chooses to support, and abysmal in any other domain.

If the direction of this standard can't reasonably be changed (assuming there are people willing to put in the effort), and in particular if WhatWG is not interested in addressing other domains in this specification. I would be fully supportive of an effort, likely through IETF's RFC process, to design a specification which actually does replace RFC 3986, and to have the WhatWG spec recognized only as the web standard on the implementation of that domain-agnostic URL specification. I will probably direct any energy I do find myself with to address this spec to that project rather than this one.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/479#issuecomment-714222112

Received on Thursday, 22 October 2020 04:46:24 UTC