Does ixml have to match the whole input? from Norm Tovey-Walsh on 2021-12-31 (public-ixml@w3.org from December 2021)

From: Norm Tovey-Walsh <norm@saxonica.com>
Date: Fri, 31 Dec 2021 10:16:49 +0000
To: ixml <public-ixml@w3.org>
Message-ID: <m2o84xvzda.fsf@saxonica.com>

Hello,

I feel like I saw mention of this recently, but can’t now put my hands
on the message where I saw it. Apologies for my failure to get this
message into the correct thread.

Consider this test from Steven:

a: "a", spaces, b.
b: spaces, "b".
spaces: " "*.

And the sample input file for that test:

a   b

For clarity:

$ od -a tests/ambig3.inp
0000000    a  sp  sp  sp   b  nl
0000006

I assert that the input does not match the grammar because there’s no
parse that allows the trailing newline character.

We could say that it matches, with a trailing newline left over, but I’d
rather not. If we do, it’ll just introduce more variation in what the
processor has to consume and produce. If trailing whitespace is allowed,
why not leading whitespace? Why not both? Exactly one, or arbitrary
amounts? What if I want a grammar that *does* match leading and/or
trailing whitespace, etc. etc. etc.

The grammar could be updated to accept trailing newlines, or the user
could strip them off before attempting to parse. Either of those seems
preferable to saying that arbitrary left over characters at the ends are
ok.

With respect to the test suite, I’d be happy to say that all inputs
should have either all or exactly one trailing newline stripped off
before attempting to parse. Or not. A decent editor should allow you to
control whether or not a trailing newline occurs, it’s just a little
tedious to manage the distinction.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Friday, 31 December 2021 10:32:35 UTC