Re: [whatwg/url] Grammar specification for URLs (#24)

@sjamaan 

If an overview of what a URL looks like is desired, https://url.spec.whatwg.org/#url-writing should be more than enough. With the exception of the definition of a [valid domain](https://url.spec.whatwg.org/#valid-domain), it could possibly even be translated into a formal grammar for valid URLs! though I will admit I haven't tried doing so. It would certainly make comparing with the RFC much easier though.

The main reason why this spec exists is the fact that it defines error handling rigorously. This is precisely what a formal grammar that defines what a valid URL is cannot do. Web content is unfortunately overwhelmingly erroneous, and part of the mission of the WHATWG is to make the web interoperable, which includes handling errors in a uniform fashion. For URL parsers outside of a browser, failing on an invalid URL may be an option. For web browsers however, it is not for the most part.

With regards to security concerns, a solution would be for everyone to adopt this standard's error handling behavior :D Indeed, we are seeing more and more adoption of this standard in the standard library of a language or runtime, like with Rust's [url](https://docs.rs/url/1.7.1/url/) crate, and Node.js' `url` module.

----

I hope this has answered your question about:

> it baffled me to read the statement that "there are several large parts of the spec that cannot be captured by any kind of grammar". This is literally equivalent to saying "we can't know if an URL will be valid without evaluating the algorithm".

Unlike the RFC, we intend to fully define the behavior when an invalid URL is encountered. This leads to a well-defined difference between "valid" and "parsable": the former is generally easy to tell, the latter unfortunately possibly not so much.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/24#issuecomment-420481794

Received on Wednesday, 12 September 2018 01:35:23 UTC