Re: Here we go again, pragmas… from C. M. Sperberg-McQueen on 2022-02-28 (public-ixml@w3.org from February 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Mon, 28 Feb 2022 09:46:43 -0700
To: Norm Tovey-Walsh <norm@saxonica.com>
Cc: public-ixml@w3.org
Message-ID: <87ilszos5o.fsf@blackmesatech.com>
Norm Tovey-Walsh writes:

> We have to talk about pragmas again eventually. One of my goals after
> getting my implementation running was to be able to experiment with some
> concrete proposals.
>
> Here’s one, expressed as a diff off the ixml grammar:

> ...

We didn't have enough proposals floating around? 

> I sort of like how small the footprint of this change is, so I didn’t
> try to file off every possible edge case.

With respect, I don't think size of change footprint is a good design
criterion.  It does have various practical advantages, but if taken too
seriously it leads, as here and in SP's 'Strawman' proposal, to just
jamming pragmas into s, which I think is a mistake.

When whitespace is eaten by the parser and not passed to the consuming
application, it does not matter at all which nonterminal the whitespace
falls in, and when deciding where s should appear rules the grammar
writer can follow any pattern that helps them ensure that whitespace can
be used in all the intuitively correct places and that the use of s in
the grammar does not introduce ambiguities.

In the ixml grammar, Steven has consistently adopted the principle that
an s follows any terminal symbol in the grammar. This is roughly
analogous to the discipline in a lexical scanner that says "read the
expected terminal *and any following whitespace*", and it has worked
very well for whitespace.

It has not worked so well for comments, both from the human point of
view and from the processor's point of view.  When working with grammars
in XML, it seems obviously right to be able to insert comments between
rules, thus:

  <rule name="S"><alt><nonterminal name="a"/></alt></rule>

  <comment>This next bit is a little tricky, as we have to
    avoid over-frobbing the diddums.  Watch carefully.</comment>
  <rule name="a"><alt> ... </alt></rule>


In the version of the ixml grammar current at the time I started doing
this, I was surprised to learn that this was not allowed, in the sense
that no conforming ixml processor could possibly produce that XML from a
conforming ixml grammar.

In ixml, I might have written the beginning of the grammar as

  S = a.

  {This next bit is a little tricky ... Watch carefully.]
  a = ... .

But the XML that came out of this, given the then current ixml grammar
was not the XML shown above; instead it was something like this:

  <rule name="S">
    <alt><nonterminal name="a"/></alt>
    <comment>This next bit is a little tricky, as we have to
    avoid over-frobbing the diddums.  Watch carefully.</comment>
  </rule>

  <rule name="a"><alt> ... </alt></rule>

The comment explaining the rule for nonterminal a gets embedded in the
rule for S.  Because in the ixml grammar, s follows terminal symbols,
and the rule for rules ended with

   ... '.', s.

This is no longer the case, because we change the rule for ixml to make
whitespace and comments occurring between the full stop of a rule and
the beginning of the next rule be children of the ixml node, not of the
rule node.  But for very similar reasons, a rule like

  empty = {nil}.

does not produce

  <rule name="empty"><alt><comment>nil</comment></alt></rule>

but something else; I'll leave working out the details as an exercise
for the reader.

The rule of putting s as the following sibling of any terminal was also
responsible for problems in ranges, which I won't repeat here since they
are documented in the issues list.

The known use cases for pragmas include supplying additional information
about terminal and nonterminal symbols on the right hand sides of rules,
about rules, and about the grammar as a whole.  I think pragmas should
be defined in such a way that a plausible placement of a pragma in the
ixml produces a plausible placement of the pragma in the XML, and vice
versa.  What counts as plausible positioning is, I believe, a matter of
tact, technical intuition, and taste.  As Steven has already pointed out
in this discussion, some notations are more intuitive and easier to use
than others, so it's important to get things right.

Adding pragma to the right hand side of s in the current grammar does
not, I think, satisfy the design goal of allowing plausible placement of
pragmas in both the ixml and the XML forms of a grammar.  Tom and I
exhibited a grammar for pragmas that I think does satisfy that design
goal.  That proposal does have the consequence that one cannot just say
"Pragmas are allowed wherever comments are allowed" and it does not
provide a way to insert pragmas that involve additional information
about some constructs (the separator in a repetition, the repetition
itself, a nested set of alts, -- any expression that is not a rule, a
grammar, or a symbol.  As noted above, and on other occasions, if anyone
has a use case for such a pragma I would be very glad to see it.  But so
far, no one has suggested any use cases for pragmas other than the ones
Tom and I catalogued (and the act of cataloguing has been taken not as
evidence that we were performing due diligence in preparing the pragmas
proposal but as evidence that we want to implement non-standard features
in our processors and thus as a reason to oppose having pragmas in the
language at all).  I submit that attempting to support use cases we
cannot describe involves supporting use cases we do not understand, and
is unlikely to result in a successful design.


> There’s a part of me that would have expected all of the characters in
> the pragma, even nested comments and pragmas to appear as pragma data:
>
>   <pragma name="name">testing {a comment} and {[nested pragma]}</pragma>

> But that isn’t how comments work today, so I went with the simple thing
> and let pragmas work the same way.

Pragmas are not the same as comments.  I think this is one way they
differ.  

I believe I have already explained why I think this grammatical approach
is a design error and described more than once grammatical formulations
that avoid the error.

> This proposal fits my sort of bare minimum needed for pragmas. It’s the
> closest thing to a compromise solution that I’ve been able to imagine,
> which at least means it’ll probably have the distinction of being hated
> by *everyone*! :-)

'Hate' is too strong a word, but I'm not sold on this as a proposal.  I
think it makes design mistakes we know how to avoid and do not need to
make.

Michael


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Monday, 28 February 2022 16:47:04 UTC