Further thoughts on delimiter choices from Norm Tovey-Walsh on 2022-01-12 (public-ixml@w3.org from January 2022)

From: Norm Tovey-Walsh <norm@saxonica.com>
Date: Wed, 12 Jan 2022 06:36:33 +0000
To: ixml <public-ixml@w3.org>
Message-ID: <m2wnj5cu1w.fsf@saxonica.com>
Hello,

Thinking about pragmas lead me to think about qualified names and that
lead me to speculate about plausible V.next changes to add namespace
declarations to ixml.

I quickly persuaded myself that the problems involved were thorny enough
to leave namespaces out of v1. I say that a little reluctantly because I
expect the language design issues will be much thornier then than they
are now. But I think it’s probably still true.

(I’d be prepared to argue for leaving them out permanently were it not
for the fact that some users will use ixml to produce XML forms that
require multiple namespaces (prose with embedded MathML or SVG, for
example), at which point they’ll have to find a way to encode namespaces
in XML without namespaces and that will be both inconvenient and ugly.)

Hypothesis: if the nonterminal @date represents an attribute named date,
then users will reasonably infer that the nonterminal @xi:include
represents an attribute named xi:include.

We don’t *have* to allow colons in nonterminal names to support
namespaces in ixml, but I think there will be a lot of pressure to do
so.

There are two bits of optionality in ixml that seem arbitrary to me: you
can delimit the nonterminal in a rule with either “:” or “=” and you can
delimit options in the right-hand-side with either “;” or “|”.

I assume, without trying to find discussion in the archives, that this
is just because there are a bunch of different (E)BNF grammars out
there, some use “:” and some use “=”, some use “;” and some use “|”, so
let’s just let authors do what makes them happy.

Fine, I guess, but if we add QNames to ixml in the future, using “:” as
a delimiter if there are colons in the nonterminal will sometimes be
ambiguous, for example:

  this:that.something:else.

I would prefer to be able to parse that without appeal to the namespaces
declared. And even if we couldn’t get agreement on that idea, it will
still be possible to write ambiguous rules if the NCName parts are each
declared as namespace prefixes.

You could say, if authors want to avoid that ambiguity, they should just
use “=” when they write the rule (or spaces around one of the colons):

  this=that.something:else.

or

  this:that.something=else.

And that’s fair. But it means that the specification will still have to
talk about the ambiguity problem, implementors will have to detect it,
tests will have to be written, conformance will have to be described,
error messages will have to be crafted, and users will have to be
trained.

I propose that we avoid all that by saying that “=” is the only
separator allowed in rules. (I am reminded of a conversation, from
another millenium, about order in content models in which one line of
reasoning presented was: “if order doesn’t matter, pick one and enforce
it!”)

I also propose that we settle on “;” as the optionality delimiter.
Bethan observed over dinner that “|” is a useful meta character and we
might want to save it for later. (Or we could use it today as the pragma
delimiter, that would also be an option.)

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica
Received on Wednesday, 12 January 2022 07:06:19 UTC