- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Fri, 31 Dec 2021 08:32:38 -0700
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
In addition to the email Steven recently sent [1] proposing this as a topic for discussion, there is a thread from April starting with another mail from Steven [2], in which we discussed this. In that earlier thread, two points came up on connection with a simple rule saying “you must in all cases consume the entire input”: - in the case of an infinite stream of characters, or one of indefinite length, it’s not clear what that would mean or how a processor would know when the stream was complete in the absence of an end-of-stream signal. - It can be reasonable to hand a grammar and an input string to a parser and ask “what prefixes of this input string match this grammar?” The current wording of the spec reflects, I think, a sense that it’s better to say nothing about those cases than attempt to define them crisply; any processor that wants to support either of those use cases will do so by extending the usual set of behaviors. There is an obvious and important interaction with the wording of our conformance clause, but though the interaction is important it is also extremely tedious to discuss and it’s clearly difficult to sustain the level of interest and engagement necessary to reach informed consensus, since the cases that must be considered are more or less necessarily remote from most of our immediate concerns. [1] https://lists.w3.org/Archives/Public/public-ixml/2021Dec/0097.html [2] https://lists.w3.org/Archives/Public/public-ixml/2021Apr/0007.html > On 31,Dec2021, at 3:16 AM, Norm Tovey-Walsh <norm@saxonica.com> wrote: > > Hello, > > I feel like I saw mention of this recently, but can’t now put my hands > on the message where I saw it. Apologies for my failure to get this > message into the correct thread. > > Consider this test from Steven: > > a: "a", spaces, b. > b: spaces, "b". > spaces: " "*. > > And the sample input file for that test: > > a b > > For clarity: > > $ od -a tests/ambig3.inp > 0000000 a sp sp sp b nl > 0000006 > > I assert that the input does not match the grammar because there’s no > parse that allows the trailing newline character. > > We could say that it matches, with a trailing newline left over, but I’d > rather not. If we do, it’ll just introduce more variation in what the > processor has to consume and produce. If trailing whitespace is allowed, > why not leading whitespace? Why not both? Exactly one, or arbitrary > amounts? What if I want a grammar that *does* match leading and/or > trailing whitespace, etc. etc. etc. > > The grammar could be updated to accept trailing newlines, or the user > could strip them off before attempting to parse. Either of those seems > preferable to saying that arbitrary left over characters at the ends are > ok. > > With respect to the test suite, I’d be happy to say that all inputs > should have either all or exactly one trailing newline stripped off > before attempting to parse. Or not. A decent editor should allow you to > control whether or not a trailing newline occurs, it’s just a little > tedious to manage the distinction. > > Be seeing you, > norm > > -- > Norm Tovey-Walsh > Saxonica
Received on Friday, 31 December 2021 15:32:56 UTC