Re: Thoughts on pragmas and iXML from Steven Pemberton on 2025-02-17 (public-ixml@w3.org from February 2025)

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Mon, 17 Feb 2025 15:10:12 +0000
To: public-ixml@w3.org
Message-Id: <1739804508777.3478934264.2946673746@cwi.nl>
The reason I think we're not designing for ixml is that pragmas are a sort of comment directed only at software: a processor, a preprocessor, a pretty printer, a linter; ixml proper is none the wiser. Leave them out, and everything remains the same for ixml.

I understand that different pieces of software have different requirements for what pragmas should do, and I think that that very software should be free to determine that. I feel we should leave the design of it to the software writers that will do the responding, and that we shouldn't impose these things on them.

I'm not saying that the things that are being suggested shouldn't be allowed; just that it is up to the software to design their interfaces.

That's why I think the XML PI solution is the best: maximum freedom for the software, minimal hassle for us to try and guess what their needs will be.

Steven

On Tuesday 04 February 2025 19:30:23 (+01:00), Norm Tovey-Walsh wrote:

> Hello,
> 
> Although we made some progress at today’s iXML call, there were also a lot of disagreements. Here are some thoughts and observations while they’re fresh in my mind.
> 
> First, the observation was made that “we aren’t designing a new language feature for iXML!” To which I reply with equal enthusiasm: “yes we are.”
> 
> No matter *what* the iXML spec eventually says about pragmas, it is a specification. What it says about pragmes *is* the design of pragmas in iXML. So we ought to try to do the job well and in a way that’s useful to grammar authors and processor implementors.
> 
> Sometimes in the discussion of pragmas, XML processing instructions (PIs) are held up as an example of a successful design and one that we ought to emulate. I think that’s fine as far as it goes, but observe that processing instructions are a designed language feature: they have a predefined structure and they are allowed at specific places, and only at specific places. You can’t put a PI in an attribute value, or in an end tag, or in a comment. And you can’t nest them.
> 
> The argument is made that iXML pragmas exist only for communicating with processors: it doesn’t matter what they look like or where they go, that’s for the processor. I think that argument is…just wrong. Humans write iXML grammars, humans read iXML grammars, humans using processors that understand pragmas care about pragmas and what goes in them.
> 
> Requirement 9 says: Pragmas must be able to annotate an iXML grammar as a whole, individual rules in a grammar, and nonterminals.
> 
> There was some disagreement on this point; appealing again to the notion that pragmas exist only for communicating with processors, it was suggested that it’s up to the processor to work out what they attach to, not our problem.
> 
> If we have any interest in interoperability, I think that’s a poorly conceived notion. But regardless, we’ve already agreed that pragmas must not change the semantics. Consider this grammar:
> 
>    S = A {[r (a|b)+]}, B.
> 
> Let’s say the “r” pragma is a directive to the processor that it is safe to use the specified regular expression. And let’s say that such a pragma is useful enough to have multiple implementations.
> 
> All processors must know if the pragma is intended for A or B. It might very well be the case that the “r” pragma does not change the semantics of the grammar if it’s attached to A, but it does if its attached to B. In order not to change the semantics, a processor must be able to recognize the pragma *and* know what its scope is. If it can’t tell, then with the best will in the world, you can’t be sure that pragma won’t change the semantics of the grammar.
> 
> One way for a processor to leverage iXML is to use the iXML specification grammar to process an input grammar in order to understand what that grammar means. It follows that the location of pragmas in the XML serialization of an iXML grammar parsed with the specification grammar is of some interest.
> 
> Although the CG is in no way constrained to adopt a pragma syntax that’s based on comments, it appears inclined to do so. I don’t think that’s a bad idea, but I do think that the way comments, as currently defined in the iXML specification grammar, are inserted into the XML serialization is unsatisfactory.
> 
> Consider:
> 
> S{a}:{b} A {c}, {d} B .
> A : "a" | "A" .
> B : "b" | "B" .
> 
> That serializes as:
> 
> <ixml>                                                            
>    <rule name='S'>
>       <comment>a</comment>
>       <comment>b</comment>
>       <alt>
>          <nonterminal name='A'>
>             <comment>c</comment>
>          </nonterminal>
>          <comment>d</comment>
>          <nonterminal name='B'/>
>       </alt>
>    </rule>
>    …
> 
> 1. Comments “a” and “b” appear adjacent in the XML representation. If the placement around the rule separator is unknowable, then there can be no semantics attached to the placement of the comments on one side or the other, yet that the fact that they do have a different placement in the iXML version of the grammar violates the expectation that XML siblings have the same relationship to each other and other grammar constructs.
> 
> 2. Comments “c” and “d” have fundamentally different relationships with the other constructs represented in the tree.
> 
> Parent/child and sibling relationships are the most obvious and natural in XML. They are a fundamental feature of good XML design. If you imagine that the parent/child relationships are significant to the semantics of a pragma, then “c” is associated with its parent construct (the nonterminal named “A”) and “d” is associated with its parent construct (the alts). If you imagine instead that sibling relationships are significant, then “c” has no siblings and “d” is the following sibling of one nonterminal and the preceding sibling of another. For the purposes of establishing the semantics of a pragma, comment placement doesn’t currently permit a consistent, logical semantics.
> 
> Is it *possible* to write software that works around these obvious warts in the design? Of course it is. Can the humans who have to write the grammars in the first place work around these warts? Probably.
> 
> But why would we make the lives of both users and implementors difficult by giving them a design whose pitfalls are so predictable and obvious when it is well within our power to do so much better just by spending the time to design consciously rather than asserting no designing is required.
> 
> Finally, I observe that nothing about making pragma placement consistent, rational, and interoperable in any way limits an implementor’s freedom to use pragmas for any wild thing they can dream up. There can be no less expressive power in deliberate placement than there is in accidental placement.
> 
>                                         Be seeing you,
>                                           norm
> 
> --
> Norm Tovey-Walsh
> Saxonica
> 
>
Received on Monday, 17 February 2025 15:10:18 UTC