- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Mon, 28 Feb 2022 09:46:43 -0700
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: public-ixml@w3.org
Norm Tovey-Walsh writes: > We have to talk about pragmas again eventually. One of my goals after > getting my implementation running was to be able to experiment with some > concrete proposals. > > Here’s one, expressed as a diff off the ixml grammar: > ... We didn't have enough proposals floating around? > I sort of like how small the footprint of this change is, so I didn’t > try to file off every possible edge case. With respect, I don't think size of change footprint is a good design criterion. It does have various practical advantages, but if taken too seriously it leads, as here and in SP's 'Strawman' proposal, to just jamming pragmas into s, which I think is a mistake. When whitespace is eaten by the parser and not passed to the consuming application, it does not matter at all which nonterminal the whitespace falls in, and when deciding where s should appear rules the grammar writer can follow any pattern that helps them ensure that whitespace can be used in all the intuitively correct places and that the use of s in the grammar does not introduce ambiguities. In the ixml grammar, Steven has consistently adopted the principle that an s follows any terminal symbol in the grammar. This is roughly analogous to the discipline in a lexical scanner that says "read the expected terminal *and any following whitespace*", and it has worked very well for whitespace. It has not worked so well for comments, both from the human point of view and from the processor's point of view. When working with grammars in XML, it seems obviously right to be able to insert comments between rules, thus: <rule name="S"><alt><nonterminal name="a"/></alt></rule> <comment>This next bit is a little tricky, as we have to avoid over-frobbing the diddums. Watch carefully.</comment> <rule name="a"><alt> ... </alt></rule> In the version of the ixml grammar current at the time I started doing this, I was surprised to learn that this was not allowed, in the sense that no conforming ixml processor could possibly produce that XML from a conforming ixml grammar. In ixml, I might have written the beginning of the grammar as S = a. {This next bit is a little tricky ... Watch carefully.] a = ... . But the XML that came out of this, given the then current ixml grammar was not the XML shown above; instead it was something like this: <rule name="S"> <alt><nonterminal name="a"/></alt> <comment>This next bit is a little tricky, as we have to avoid over-frobbing the diddums. Watch carefully.</comment> </rule> <rule name="a"><alt> ... </alt></rule> The comment explaining the rule for nonterminal a gets embedded in the rule for S. Because in the ixml grammar, s follows terminal symbols, and the rule for rules ended with ... '.', s. This is no longer the case, because we change the rule for ixml to make whitespace and comments occurring between the full stop of a rule and the beginning of the next rule be children of the ixml node, not of the rule node. But for very similar reasons, a rule like empty = {nil}. does not produce <rule name="empty"><alt><comment>nil</comment></alt></rule> but something else; I'll leave working out the details as an exercise for the reader. The rule of putting s as the following sibling of any terminal was also responsible for problems in ranges, which I won't repeat here since they are documented in the issues list. The known use cases for pragmas include supplying additional information about terminal and nonterminal symbols on the right hand sides of rules, about rules, and about the grammar as a whole. I think pragmas should be defined in such a way that a plausible placement of a pragma in the ixml produces a plausible placement of the pragma in the XML, and vice versa. What counts as plausible positioning is, I believe, a matter of tact, technical intuition, and taste. As Steven has already pointed out in this discussion, some notations are more intuitive and easier to use than others, so it's important to get things right. Adding pragma to the right hand side of s in the current grammar does not, I think, satisfy the design goal of allowing plausible placement of pragmas in both the ixml and the XML forms of a grammar. Tom and I exhibited a grammar for pragmas that I think does satisfy that design goal. That proposal does have the consequence that one cannot just say "Pragmas are allowed wherever comments are allowed" and it does not provide a way to insert pragmas that involve additional information about some constructs (the separator in a repetition, the repetition itself, a nested set of alts, -- any expression that is not a rule, a grammar, or a symbol. As noted above, and on other occasions, if anyone has a use case for such a pragma I would be very glad to see it. But so far, no one has suggested any use cases for pragmas other than the ones Tom and I catalogued (and the act of cataloguing has been taken not as evidence that we were performing due diligence in preparing the pragmas proposal but as evidence that we want to implement non-standard features in our processors and thus as a reason to oppose having pragmas in the language at all). I submit that attempting to support use cases we cannot describe involves supporting use cases we do not understand, and is unlikely to result in a successful design. > There’s a part of me that would have expected all of the characters in > the pragma, even nested comments and pragmas to appear as pragma data: > > <pragma name="name">testing {a comment} and {[nested pragma]}</pragma> > But that isn’t how comments work today, so I went with the simple thing > and let pragmas work the same way. Pragmas are not the same as comments. I think this is one way they differ. I believe I have already explained why I think this grammatical approach is a design error and described more than once grammatical formulations that avoid the error. > This proposal fits my sort of bare minimum needed for pragmas. It’s the > closest thing to a compromise solution that I’ve been able to imagine, > which at least means it’ll probably have the distinction of being hated > by *everyone*! :-) 'Hate' is too strong a word, but I'm not sold on this as a proposal. I think it makes design mistakes we know how to avoid and do not need to make. Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Monday, 28 February 2022 16:47:04 UTC