Re: [whatwg/url] Provide a succinct grammar for valid URL strings (#479)

To comment on this list:

* [ ]  Decide if an addendum is enough, or if RFC3986/39876 should be merged (the latter has my preference)
You don't need to decide this in a "rough consensus" way: people will disagree what is or isn't "enough".

Do the minimum; if that isn't enough then you can do more.

* [ ]  Decide if the full WHATWG parsing/resolution behaviour should be included, or if it is enough to provide the elementary operations that can then be recombined in the WHATWG standard to exactly reproduce their current behaviour (latter one has my preference, then the standards can really be complementary!)

Again, start with the minimum.

* [ ]  Decide how to include the loose grammar in such a document (my preference: parameterise the character sets)

what about leaving out the "loose" grammar -- if that's what people want, they should look at WHATWG's URL.


Don't understand these:

* [ ]  Rewrite my 'force' operation into the RFC style and maybe refactor the merge operations from RFC3986 a little, or switch to my model of sequences more whole heartedly.
* [ ]  Amend or parameterise the 'path merge' to support the WHATWG percent-encoded dotted segments.
* [ ]  A remaining technical issue: solve #574, and figure out how to incorporate that into the RFC grammar

* [ ]  Decide what to do with the numbers in the ip-addresses of the loose grammar, esp. how to express their allowed range (ie. on the grammatical level as in RFC3986 or on a semantic level)

Is this necessary if there isn't a "loose grammar"?

* [ ]  Preferably, find implementations of the existing RFCs, work with them to implement the additions and have them test agains the wpt test suite, to corroborate that the additions can be combined to express the WHATWG behaviour
* [ ]  Expand the wpt test suite to include validity tests (!!)


These seem more useful than writing more specs

* [ ]  Write about the encoding-normal form, parameterise it by component-dependent character sets, so that the percentEncodeSets of the WHATWG standard can be plugged into the comparison ladder nicely.

To what end? For what audience? Going forward, shouldn't we aim toward UTF-8?

* [ ]  For the WHATWG standard: decide if a precomposed version of the 'basic-url-parser' should be kept or if it should be split up. It may be possible to automatically generate a precomposed version from an implementation of the elementary operations, and to also automatically generate the pseudocode from that.
 
What's the minimum? I wouldn't count on a sudden welling of support.

> Let's get started!

I hope you take my comments as constructive.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/479#issuecomment-865468969

Received on Tuesday, 22 June 2021 02:18:39 UTC