- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Thu, 24 Mar 2022 14:21:11 +0000
- To: public-ixml@w3.org
- Message-ID: <m2tubnl945.fsf@Hackmatack.fritz.box>
Hello, Since I said we should try to have some email discussion of pragmas before the next meeting, let me start. Where are we on pragmas? Here’s my perspective. Some of this recapitulates parts of the pragmas proposal from Michael and Tom. I hope it’s helpful, and not distracting, for me to capture this perspective in an email message. 1. What are pragmas? They’re mechanism for a grammar author to signal that something is special about some part of the grammar. What constitutes “special” is open ended. They’re a mechanism for the author to communicate to a processor that something is special. If the processor doesn’t know what that kind of special is, it can just ignore the pragma. A pragma might apply to the grammar as a whole, to a particular rule, or to a particular symbol. (It’s possible to go further than that, and we might want to go a little further, but that’s sufficient for the moment.) Lots of languages make things we might call pragmas from a form that looks a bit like a comment. In line oriented languages, that’s kind of ok. //#ifdef some-condition … //#endif It’s not ideal in a lot of ways, but it works well enough. It doesn’t work especially well for things that need to be more closely bound to the pragma. Java uses annotations: @Test public void thisIsATest() { … } So there’s clearly precedent for non-comment syntaxes. For languages that aren’t line oriented, like XML, this doesn’t really work at all. Comments are effectively unusable really because they can’t be nested. XML has a CDATA pragma built in: <![CDATA[ … ]]> but it carries with it the awkward constraint that “]]>” is forbidden in content. In principle other pragmas of the form <![TOKEN[ could be invented (there were several others in SGML), but XML provides no mechanism for them to be defined. In XML, I expect most pragma like things boil down to being processing instructions: <?test?> <function name="thisIsATest">…</function> Unfortunately, PIs don’t have a start and an end so you need to manage them as tombstones which is more than a little inconvenient. 2. Are pragmas comments? Sure, in as much as, like comments, you can ignore them if you don’t care about pragmas, or if you encounter a pragma you don’t recognize, or if the moon is full. 3. Are pragmas comments? No, not really. Even if we settle on a syntax that makes them look like a kind of comment, it’s not sufficient to just leave them as comments in the grammar. It’s easy to demonstrate that with an example. Consider: {[example grammar pragma]} {[example rule pragma]} symbol: A . A: {[example symbol 'a' pragma]} 'a', {[example symbol B pragma]} B. B: . If you parse this with an ixml grammar that knows nothing about pragmas, those are comments, and the result is: <ixml> <comment>[example grammar pragma]</comment> <comment>[example rule pragma]</comment> <rule name="symbol"> <alt> <nonterminal name="A"/> </alt> </rule> <rule name="A"> <comment>[example symbol 'a' pragma]</comment> <alt> <literal string="a"/> <comment>[example symbol B pragma]</comment> <nonterminal name="B"/> </alt> </rule> <rule name="B"> <alt/> </rule> </ixml> This is unsatisfactory in a couple of ways. First, it’s impossible to distinguish between the pragmas that are intended to apply to the grammar as a whole and the pragmas that are supposed to apply to the first rule. Second, the pragma is not associated with its target. It looks like the association rule would have to be “next symbol”, but that doesn’t work for the ‘a’ pragma because its next symbol is the <alt>. If, instead, we extend the ixml grammar to have pragmas as distinct from comments, we can do much better: <ixml> <prolog> <pragma name="example"> <pragma-data>grammar pragma</pragma-data> </pragma> </prolog> <rule name="symbol"> <pragma name="example"> <pragma-data>rule pragma</pragma-data> </pragma> <alt> <nonterminal name="A"/> </alt> </rule> <rule name="A"> <alt> <literal string="a"> <pragma name="example"> <pragma-data>symbol 'a' pragma</pragma-data> </pragma> </literal> <nonterminal name="B"> <pragma name="example"> <pragma-data>symbol B pragma</pragma-data> </pragma> </nonterminal> </alt> </rule> <rule name="B"> <alt/> </rule> </ixml> Now pragmas are properly contained by the elements to which they apply. In fairness, the input for that example is very slightly different from the ixml shown above. It begins: {[example grammar pragma]} . We need the slight extension of a full stop after pragmas that apply to the whole grammar in order to distinguish them from pragmas for the first rule. 4. How much change is required to the ixml grammar? Not a lot, actually. The output above comes from a grammar that differs from the current ixml grammar only in the following rules: prolog: s, (ppragma+s, s)?. -ppragma: pragma, s, -'.'. comment: -"{", ((comment; ~["[]{}"]), (cchar; comment)*)?, -"}". rule: annotation, name, s, -["=:"], s, -alts, (pragma, sp)?, -".". nonterminal: annotation, name, s. -annotation: (pragma, sp)?, (mark, sp)?. -sp: (whitespace; comment; pragma)*. -quoted: tannotation, string, s. -tannotation: (pragma, sp)?, (tmark, sp)?. -encoded: tannotation, -"#", hex, s. inclusion: tannotation, set. exclusion: tannotation, -"~", s, set. pragma: -"{[", @pmark?, @name, (whitespace, pragma-data)?, -"]}" . @pmark: ["@^?"]. pragma-data: (-pragma-char; -bracket-pair)*. -pragma-char: ~["{}"]. -bracket-pair: '{', -pragma-data, '}'. Note: if we decided to support QNames for pragma names, there would be a couple of additional rules. 5. Do we really need pragmas? Yes, I think we do. I think the use cases provided in the pragmas proposal make a compelling case. I’ve already implemented a few, including a pragma that allows the parser to use a greedy regex match for symbols. I think that’s got potential to really improve performance. 6. What about things that should be part of the language? It’s possible that implementors will use pragmas to support features not otherwise in the language. I don’t think that’s a compelling argument against pragmas. When the language supports those features natively, the pragmas will fade away. In the meantime, it’s a way to experiment with designs that might usefully inform future versions of the language. Implementors will implement. It seems like it’s better to provide an extensibility mechanism that’s broadly interoperable than to have all extensibility be done by incompatible grammar extension. For what it’s worth, I’m currently using the pragmas grammar as described above by default in my implementation. For a grammar that doesn’t use pragmas, it makes no difference. For a grammar that does use pragmas, the implementation either does what they’re programmed to mean or it ignores them. 7. Open questions I think there are two open questions, beyond “are we going to support them” which I hope will not be open much longer! a. Are pragmas going to be identified by name or by QName? Personally, I think we should just use @name until such time as @qname is allowed in other places in the grammar, like the names of nonterminals. If ever, since I’ve come to the conclusion we shouldn’t do that :-) b. Precisely how should the grammar include pragmas? The grammar from the pragmas proposal is good enough I think. That’s basically what you see above, except that I think I had to tweak it in a couple of small ways to remove ambiguity. It might be worth exploring a few small changes, so that a pragma can apply to a following alts group instead of only a following symbol, but I don’t think it’s a show stopper if we decide not to. Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Thursday, 24 March 2022 14:26:50 UTC