- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Fri, 31 Dec 2021 10:16:49 +0000
- To: ixml <public-ixml@w3.org>
- Message-ID: <m2o84xvzda.fsf@saxonica.com>
Hello, I feel like I saw mention of this recently, but can’t now put my hands on the message where I saw it. Apologies for my failure to get this message into the correct thread. Consider this test from Steven: a: "a", spaces, b. b: spaces, "b". spaces: " "*. And the sample input file for that test: a b For clarity: $ od -a tests/ambig3.inp 0000000 a sp sp sp b nl 0000006 I assert that the input does not match the grammar because there’s no parse that allows the trailing newline character. We could say that it matches, with a trailing newline left over, but I’d rather not. If we do, it’ll just introduce more variation in what the processor has to consume and produce. If trailing whitespace is allowed, why not leading whitespace? Why not both? Exactly one, or arbitrary amounts? What if I want a grammar that *does* match leading and/or trailing whitespace, etc. etc. etc. The grammar could be updated to accept trailing newlines, or the user could strip them off before attempting to parse. Either of those seems preferable to saying that arbitrary left over characters at the ends are ok. With respect to the test suite, I’d be happy to say that all inputs should have either all or exactly one trailing newline stripped off before attempting to parse. Or not. A decent editor should allow you to control whether or not a trailing newline occurs, it’s just a little tedious to manage the distinction. Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Friday, 31 December 2021 10:32:35 UTC