Re: [csswg-drafts] [selectors-4] Augment the grammar to unambigously encode handling of white-space? (#10940)

From the perspective of an implementer, an issue with informally describing handling of white-space in context of the Selectors spec., is that evidently the grammar addresses _tokens_, not e.g. Unicode code points -- the spec. defers to the Syntax specification which defines tokenization, and just expects a stream of tokens. The implication of this is that by the Syntax spec., white-space tokens are a thing, they're a "first class citizen" so to speak.

Now, for the Selectors grammar to simply omit their presence in productions, instead opting for informally defining how these tokens are to be dealt with by parsers -- in prose, in my opinion does a real disservice to parser writers? This is a bold claim, I admit, so let me try to elaborate -- if we assume a non-trivial percentage of people _reading_ the [Selectors] spec., are in fact doing so in order to implement a [CSS selectors] parser, while generally making the spec. _readable_ would absolutely be a sound decision, in this specific case it's done so apparently at the expense of making it eas(ier) to implement a parser -- through a grammar that omits first-class CSS citizens that are white-space tokens!

I am not advocating for dispensing with the grammar -- for my part it's made implementing a selector parsing much easier since I could e.g. just _copy_ it, as-is, into a file, have it parsed according to the corresponding notation (defined in Values & Units in large part), then feed the resulting grammar object to a parser generator which will get me, in theory, a working CSS selectors parser. Or I could express the equivalent of the grammar myself (without a parser generator), in code, feeding it to a general parser (which is what I am doing now in my implementation, despite having said I am "using a parser generator" -- I should have made it clear, for clarity's sake, it's the goal not current state of art).

In either case, I have had to "manually" insert `<whitespace-token>?` elements into parts of the grammar expression I have, in order to implement what is otherwise specified in prose in the Selectors document. And I cannot see why -- when white-space tokens are, after all, specified and are vended by the abstract tokenization procedure defined in Syntax -- the Selectors grammar can't just include the corresponding white-space productions explicitly, to dispense with having to informally specify the language?

Same would go for comment tokens -- since the above wouldn't include parsing of `foo/**/bar`. But that is another issue. For my part I solved it by having comment tokens be white-space tokens (not vice versa), although coming to think of it lately I should have a common superclass called `SpaceToken` of which `WhiteSpaceToken` and `CommentToken` are sub-classes. Then the parser could be oblivious to _the kind_ of white-space it encounters, dealing with `<space-token>` productions, but with these being explicit in the grammar.

-- 
GitHub Notification of comment by amn
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/10940#issuecomment-2376641082 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Thursday, 26 September 2024 11:08:34 UTC