Re: [whatwg/url] It is unlikely that the spec's two definitions of valid URL strings are equal (Issue #704)

This is an excellent point and really cuts, in a way, to the core of what #479 was originally about. Thank you!

Given that this is a well-explored area of theory, I think we can do better than a fuzzer and use a formally verified implementation of the parser. Then the spec could be arranged as follows, perhaps:

1. Make a formal grammar defining the valid URLs.
2. Add a section with a "web algorithm" which parsea URLs and determines validity.
3. Write an implementation of the algorithm in a language with formal verification (Idris would be my choice but only because or familiarity) that proves equivalence of the two approaches.

There could potentially be other benefits:

1. Proving the equivalence of various portions of the URL as parsed by the algorithm with the equivalent grammar productions, validating the use of other algorithms to produce the same result, at least on valid input.
2. Exploring whether there is a more succinct way to describe a "parseable" URL and the corresponding valid one.
3. With a formally verified implementation validating against a grammar, it would be possible to propose changes to the structure of the algorithm to improve clarity or fix bugs with confidence that they do not have unexpected consequences, which would make the spec easier to maintain.

I do have thoughts on the "named validation errors" issue but I think those belong in a thread of their own.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/704#issuecomment-1235413470
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/704/1235413470@github.com>

Received on Friday, 2 September 2022 11:58:28 UTC