Re: Does ixml have to match the whole input? from C. M. Sperberg-McQueen on 2021-12-31 (public-ixml@w3.org from December 2021)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Fri, 31 Dec 2021 08:32:38 -0700
To: Norm Tovey-Walsh <norm@saxonica.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
Message-Id: <CA8F449D-6BDF-4CE7-B06F-634421E50EAB@blackmesatech.com>

In addition to the email Steven recently sent [1] proposing this
as a topic for discussion, there is a thread from April starting with
another mail from Steven [2], in which we discussed this.  In that
earlier thread, two points came up on connection with a simple rule 
saying “you must in all cases consume the entire input”:

    - in the case of an infinite stream of characters, or one of
      indefinite length, it’s not clear what that would mean
      or how a processor would know when the stream was
      complete in the absence of an end-of-stream signal.

    - It can be reasonable to hand a grammar and an input
      string to a parser and ask “what prefixes of this input string
      match this grammar?”

The current wording of the spec reflects, I think, a sense that 
it’s better to say nothing about those cases than attempt to 
define them crisply; any processor that wants to support either
of those use cases will do so by extending the usual set of
behaviors.  There is an obvious and important interaction
with the wording of our conformance clause, but though the
interaction is important it is also extremely tedious to discuss
and it’s clearly difficult to sustain the level of interest and 
engagement necessary to reach informed consensus, since
the cases that must be considered are more or less necessarily
remote from most of our immediate concerns.

[1] https://lists.w3.org/Archives/Public/public-ixml/2021Dec/0097.html
[2] https://lists.w3.org/Archives/Public/public-ixml/2021Apr/0007.html

> On 31,Dec2021, at 3:16 AM, Norm Tovey-Walsh <norm@saxonica.com> wrote:
> 
> Hello,
> 
> I feel like I saw mention of this recently, but can’t now put my hands
> on the message where I saw it. Apologies for my failure to get this
> message into the correct thread.
> 
> Consider this test from Steven:
> 
> a: "a", spaces, b.
> b: spaces, "b".
> spaces: " "*.
> 
> And the sample input file for that test:
> 
> a   b
> 
> For clarity:
> 
> $ od -a tests/ambig3.inp
> 0000000    a  sp  sp  sp   b  nl
> 0000006
> 
> I assert that the input does not match the grammar because there’s no
> parse that allows the trailing newline character.
> 
> We could say that it matches, with a trailing newline left over, but I’d
> rather not. If we do, it’ll just introduce more variation in what the
> processor has to consume and produce. If trailing whitespace is allowed,
> why not leading whitespace? Why not both? Exactly one, or arbitrary
> amounts? What if I want a grammar that *does* match leading and/or
> trailing whitespace, etc. etc. etc.
> 
> The grammar could be updated to accept trailing newlines, or the user
> could strip them off before attempting to parse. Either of those seems
> preferable to saying that arbitrary left over characters at the ends are
> ok.
> 
> With respect to the test suite, I’d be happy to say that all inputs
> should have either all or exactly one trailing newline stripped off
> before attempting to parse. Or not. A decent editor should allow you to
> control whether or not a trailing newline occurs, it’s just a little
> tedious to manage the distinction.
> 
>                                        Be seeing you,
>                                          norm
> 
> --
> Norm Tovey-Walsh
> Saxonica

Received on Friday, 31 December 2021 15:32:56 UTC