Re: Thoughts on pragmas and iXML from Bethan Tovey-Walsh on 2025-02-05 (public-ixml@w3.org from February 2025)

From: Bethan Tovey-Walsh <bytheway@linguacelta.com>
Date: Wed, 5 Feb 2025 14:03:31 +0000
To: ixml <public-ixml@w3.org>
Message-Id: <E2A8619E-7EE0-4398-8C9F-35FA88EC6AB7@linguacelta.com>
I think Norm's message details a few things that I consider very important:

1. The "foundational rules", which were unanimously accepted as being of primary importance by the CG - i.e. the first two requirements: pragmas must not affect the syntactic validity, or change the semantics, of the base grammar. As Norm noted, it's simply impossible for an implementor or a grammar author to guarantee that their pragma will do neither of these things if there is no cross-implementation consistency about pragma scope. 

2. Interoperability - wherever possible, I think that the iXML specification should not encourage vendor lock-in. If every implementation has to invent its own semantics for the scope of pragmas, users of one implementation may find it hard to write grammars for a different implementation (as they will have to learn a different convention for placing pragmas in the grammar), and may also find that the grammars they write don't function correctly if they switch implementations. This increases the effort needed to switch implementations, which I think is a very bad thing.

3. Being considerate to users and implementors. This is already touched upon in point 2 - a consistent semantics for pragma scope means that users can always know what the scope of a pragma is, without having to decide in advance which implementation they will use and then check that their grammar conforms to that implementation's interpretation of pragma scope. Implementors, in turn, will not each independently have to come up with a mechanism for deciding a pragma's scope, with all the testing and bugfixing that will inevitably require. Consistency in the XML representation is also considerate to users - even if pragmas are "for communicating with software", they are inevitably part of a larger grammar, which must be human-readable and human-writable. A consistent semantics for pragma scope in both iXML and XML grammars will reduce the cognitive effort needed to understand those grammars.

For me, the three points above are the key ones in this discussion. If the specification is going to introduce a new feature, I think it's our duty to design that feature as well as we can, and to give users and implementors as much help as possible to write grammars that conform to the specification. The less rigorous our approach to this new feature is, the more work authors and implementors will have to do in order to write conformant software and conformant grammars.

It would be really useful to hear from some of those who oppose this requirement, as well as from any who are in agreement. I'd be particularly interested to hear arguments which refute my point 1 above, since I can't currently see any logical way around it. I'd love to get this discussion going over email; it's hard to get any substantive debate concluded in one hour a fortnight - and email also has the advantage of allowing more people to contribute to the discussion.

Very best,

BTW

___________________________________________________ 
Dr. Bethan Tovey-Walsh 

linguacelta.com <http://linguacelta.com/> 

Golygydd | Editor http://geirfan.cymru <http://geirfan.cymru/> 

Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 4 Feb 2025, at 18:30, Norm Tovey-Walsh <norm@saxonica.com> wrote:
> 
> Hello,
> 
> Although we made some progress at today’s iXML call, there were also a lot of disagreements. Here are some thoughts and observations while they’re fresh in my mind.
> 
> First, the observation was made that “we aren’t designing a new language feature for iXML!” To which I reply with equal enthusiasm: “yes we are.”
> 
> No matter *what* the iXML spec eventually says about pragmas, it is a specification. What it says about pragmes *is* the design of pragmas in iXML. So we ought to try to do the job well and in a way that’s useful to grammar authors and processor implementors.
> 
> Sometimes in the discussion of pragmas, XML processing instructions (PIs) are held up as an example of a successful design and one that we ought to emulate. I think that’s fine as far as it goes, but observe that processing instructions are a designed language feature: they have a predefined structure and they are allowed at specific places, and only at specific places. You can’t put a PI in an attribute value, or in an end tag, or in a comment. And you can’t nest them.
> 
> The argument is made that iXML pragmas exist only for communicating with processors: it doesn’t matter what they look like or where they go, that’s for the processor. I think that argument is…just wrong. Humans write iXML grammars, humans read iXML grammars, humans using processors that understand pragmas care about pragmas and what goes in them.
> 
> Requirement 9 says: Pragmas must be able to annotate an iXML grammar as a whole, individual rules in a grammar, and nonterminals.
> 
> There was some disagreement on this point; appealing again to the notion that pragmas exist only for communicating with processors, it was suggested that it’s up to the processor to work out what they attach to, not our problem.
> 
> If we have any interest in interoperability, I think that’s a poorly conceived notion. But regardless, we’ve already agreed that pragmas must not change the semantics. Consider this grammar:
> 
>   S = A {[r (a|b)+]}, B.
> 
> Let’s say the “r” pragma is a directive to the processor that it is safe to use the specified regular expression. And let’s say that such a pragma is useful enough to have multiple implementations.
> 
> All processors must know if the pragma is intended for A or B. It might very well be the case that the “r” pragma does not change the semantics of the grammar if it’s attached to A, but it does if its attached to B. In order not to change the semantics, a processor must be able to recognize the pragma *and* know what its scope is. If it can’t tell, then with the best will in the world, you can’t be sure that pragma won’t change the semantics of the grammar.
> 
> One way for a processor to leverage iXML is to use the iXML specification grammar to process an input grammar in order to understand what that grammar means. It follows that the location of pragmas in the XML serialization of an iXML grammar parsed with the specification grammar is of some interest.
> 
> Although the CG is in no way constrained to adopt a pragma syntax that’s based on comments, it appears inclined to do so. I don’t think that’s a bad idea, but I do think that the way comments, as currently defined in the iXML specification grammar, are inserted into the XML serialization is unsatisfactory.
> 
> Consider:
> 
> S{a}:{b} A {c}, {d} B .
> A : "a" | "A" .
> B : "b" | "B" .
> 
> That serializes as:
> 
> <ixml>                                                            
>   <rule name='S'>
>      <comment>a</comment>
>      <comment>b</comment>
>      <alt>
>         <nonterminal name='A'>
>            <comment>c</comment>
>         </nonterminal>
>         <comment>d</comment>
>         <nonterminal name='B'/>
>      </alt>
>   </rule>
>   …
> 
> 1. Comments “a” and “b” appear adjacent in the XML representation. If the placement around the rule separator is unknowable, then there can be no semantics attached to the placement of the comments on one side or the other, yet that the fact that they do have a different placement in the iXML version of the grammar violates the expectation that XML siblings have the same relationship to each other and other grammar constructs.
> 
> 2. Comments “c” and “d” have fundamentally different relationships with the other constructs represented in the tree.
> 
> Parent/child and sibling relationships are the most obvious and natural in XML. They are a fundamental feature of good XML design. If you imagine that the parent/child relationships are significant to the semantics of a pragma, then “c” is associated with its parent construct (the nonterminal named “A”) and “d” is associated with its parent construct (the alts). If you imagine instead that sibling relationships are significant, then “c” has no siblings and “d” is the following sibling of one nonterminal and the preceding sibling of another. For the purposes of establishing the semantics of a pragma, comment placement doesn’t currently permit a consistent, logical semantics.
> 
> Is it *possible* to write software that works around these obvious warts in the design? Of course it is. Can the humans who have to write the grammars in the first place work around these warts? Probably.
> 
> But why would we make the lives of both users and implementors difficult by giving them a design whose pitfalls are so predictable and obvious when it is well within our power to do so much better just by spending the time to design consciously rather than asserting no designing is required.
> 
> Finally, I observe that nothing about making pragma placement consistent, rational, and interoperable in any way limits an implementor’s freedom to use pragmas for any wild thing they can dream up. There can be no less expressive power in deliberate placement than there is in accidental placement.
> 
>                                        Be seeing you,
>                                          norm
> 
> --
> Norm Tovey-Walsh
> Saxonica
>
Received on Wednesday, 5 February 2025 14:03:50 UTC