Re: [whatwg/url] Provide a succinct grammar for valid URL strings (#479) from Alexis Hunt on 2020-05-12 (public-webapps-github@w3.org from May 2020)

From: Alexis Hunt <notifications@github.com>
Date: Tue, 12 May 2020 09:45:54 -0700
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/479/627461554@github.com>

Apologies for the slow reply.

I think that @masinter is right, and that my concern matches the ones discussed there. I skimmed those threads and a few statements concerned me, such as the assertion that a full Turing machine is required to parse URLs: this would very much surprise me; my instinct on reading the grammar is that, once you separate out the different paths for `file`, special schemes, and non-special schemes in relative URLs, the result is almost certainly context-free. It might even be regular. The fact that the given algorithm mostly does a single pass is a strong indication that complex parsing is not required.

I discussed with a colleague of mine, @tabatkins, and they said that the CSS syntax parser was much improved when it was rewritten from a state-machine based parser into a recursive descent parser. Doing this would effectively require writing the URL grammar out as a context-free grammar, which would make providing a BNF-like specification, even if it's only informative, very easy.

Separately, but not entirely, splitting out the parsing from the semantic functions (checking some validity rules, creating the result URL when parsing a relative URL string) would likely improve the readability of the spec and the simplicity of implementing it. I think this might be better suited for a separate thread, though, as there are some other thoughts I have in this vein as well.

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/479#issuecomment-627461554

Received on Tuesday, 12 May 2020 16:46:07 UTC