Re: Thoughts on pragmas and iXML from Bethan Tovey-Walsh on 2025-02-12 (public-ixml@w3.org from February 2025)

From: Bethan Tovey-Walsh <bytheway@linguacelta.com>
Date: Wed, 12 Feb 2025 16:55:20 +0000
To: Graydon Saunders <graydonish@gmail.com>
Cc: ixml <public-ixml@w3.org>
Message-Id: <5BB1CE44-B913-4C21-9CF2-BCF309D3FBD8@linguacelta.com>
> It could be left up to the implementers, but
> if there's any notion of scope it might be preferable to define which
> instance of the pragma applies in what context.


As I understand it, you're thinking of a case where (for example) both a rule and one of the nonterminals on its RHS have pragmas attached. Let's assume, without loss of generality, that we're using a version of the pragma syntax described in the previous proposal by Michael and Tomos. Let's also assume that I have a pragma allowing me to specify a regex that recognizes the same language as the iXML construct to which it attaches. A grammar containing instances of that pragma might look like this:

 ixml version "1.1".

 sentence: animals , verb , food.

 {[regex "([dh]og|lion(esse)?|hipster)?s"]} animals :
  {[regex "[dh]ogs"]} animals_1; 
  {[regex "lion(esse)?s"]} animals_2; 
  "hipsters".

 animals_1: ("dogs" ; "hogs").

 animals_2: ("lions" ; "lionesses").

 verb: " like" ; " love".

 food: " meat!" ; " sausages!" ; " avocados!".

Norm's implementation gives us this XML representation of the <animals> rule:

 <rule name='animals'>
  <pragma pname='regex'>
   <pragma-data>"([dh]og|lion(esse)?|hipster)?s"</pragma-data>
  </pragma>
  <alt>
   <nonterminal name='animals_1'>
    <pragma pname='regex'>
     <pragma-data>"[dh]ogs"</pragma-data>
    </pragma>
   </nonterminal>
  </alt>
  <alt>
   <nonterminal name='animals_2'>
    <pragma pname='regex'>
     <pragma-data>"lion(esse)?s"</pragma-data>
    </pragma>
   </nonterminal>
  </alt>
  <alt>
   <literal string='hipsters'/>
  </alt>
 </rule>

If I understand you correctly, you're suggesting that the specification should require that pragmas are interpreted with a specific understanding of precedence, so that (e.g.) the pragma on the "animals" nonterminal has precedence over the pragmas on its children, "animals_1" and "animals_2", which are essentially nullified in favour of the pragma on their parent. Is that right?

I'd be very hesitant to do this, since it seems to me an unnecessary constraint on the semantics of the pragmas in question. If two different processors are given this grammar, either they both understand the semantics of the pragma, and will therefore process it identically (including its inbuilt rules about precedence), or they do not, in which case one or both of them will be unable to use the pragma at all and will not act upon it, so precedence is immaterial.

I can imagine cases in which a pragma would function best with precedence going to the element highest in the parse tree; I can also imagine cases in which the opposite is true. Either way, allowing the semantics of the pragma itself to specify the circumstances in which its effect does or doesn't apply feels consistent with the general feeling that we shouldn't try to place arbitrary limits on pragma semantics. This isn't a question of where the pragma belongs syntactically, in the parse tree, but of how the pragma affects the processing of an input - that feels to me fairly firmly like a matter of semantics.

> I should be curious as to which use cases are advanced where a pragma of
> no or arbitrary scope is useful.

As I understand it, the objection to Requirements 9 and 10 is an objection to the notion that implementations should have a common understanding of where a pragma belongs in the parse tree - i.e. which element of the grammar it is attached to. So there's no suggestion that a pragma would have no / arbitrary scope for any given processor, but that different processors may have different interpretations of scope, and that the specification should not provide any guidance on the matter.

So, given this grammar:

 sentence: words++" ".

 {[pragma_name data content]}

 words: "absolutely", "whales", "earnest", "about", "data", "happy".

implementations might differ in their understanding of where the pragma sits in the parse tree. Possibilities include:

1. the pragma is attached to the preceding rule, "sentence":

 <rule name='sentence'>
  <pragma pname='pragma_name'>
   <pragma-data>data content</pragma-data>
  </pragma>
  <alt>
   [snip]
  </alt>
 </rule>

2. the pragma is attached to the following rule, "words":

 <rule name='words'>
  <pragma pname='pragma_name'>
   <pragma-data>data content</pragma-data>
  </pragma>
  <alt>
   [snip]
  </alt>
 </rule>

3. the pragma is attached to the literal "data", since the pragma-data starts with that string:

 <alt>
  <literal string='absolutely'/>
  <literal string='whales'/>
  <literal string='earnest'/>
  <literal string='about'/>
  <literal string='data'>
   <pragma pname='pragma_name'>
    <pragma-data>data content</pragma-data>
   </pragma>   
  </literal>
  <literal string='happy'/>
 </alt>

4. the pragma is attached to the grammar as a whole, since it appears between rules (after the closing period of one rule and before the LHS of the following rule):

 <ixml>
  <rule name='sentence'>
   [snip]
  </rule>
    <pragma pname='pragma_name'>
   <pragma-data>data content</pragma-data>
  </pragma>
  <rule name='words'>
   [snip]
  </rule>
 </ixml>

Other ways of interpreting the pragma's scope are no doubt possible. A user would have to check each implementation's rules for establishing the syntactic scope of a pragma in order to read and write grammars with pragmas. They would also have to expect that different processors would produce different parse trees when generating an XML version of an iXML grammar including pragmas.

I'm paraphrasing someone else's argument here, so I can only apologize if I've misrepresented anything, but this is how I understand the counter-proposal by opponents of Requirements 9 and 10.

> I think that makes sense; as a user, I would like to know what parts of
> the grammar to which the pragma will apply, and (presumably) an
> implementor necessarily has the same concern. Which means you have to
> (in some informal sense) bind a pragma to a grammar construct for a
> pragma to be useful.

Exactly - the current unresolved debate is whether the principles governing that binding should be established in the specification, or whether it should be a matter for implementations to determine their own set of principles.

> This does not allow the "a pragma is a comment" identity, though; or at
> least, I think comments CAN hover between two rules. 

Yes - the syntax of comments as they're currently specified is not optimal for pragmas in a few minor ways. A proposal for adding pragmas need not make pragmas a variation on comments, of course - the requirements document deliberately tries to avoid making any assumptions about the ideal syntactic form for pragmas, so that the design process is open to innovative suggestions.

All bests,

Bethan


***

Dr. Bethan Tovey-Walsh 

linguacelta.com <http://linguacelta.com/> 

Golygydd | Editor http://geirfan.cymru <http://geirfan.cymru/> 

Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 12 Feb 2025, at 02:43, Graydon <graydonish@gmail.com> wrote:
> 
> On Mon, Feb 10, 2025 at 01:15:09AM +0000, Bethan Tovey-Walsh scripsit:
>> Indeed, I suspect that anything you can imagine doing with a pragma
>> annotating a pragma, you can instead do with a single pragma.
> 
> Formally, I would also expect so.
> 
> A pragma in scope for a non-terminal and the same pragma in scope for
> another non-terminal that becomes a descendant of the XML element
> generated by the first non-terminal might need to have a defined
> interaction all the same. It could be left up to the implementers, but
> if there's any notion of scope it might be preferable to define which
> instance of the pragma applies in what context.
> 
> In requirements language, something like "a pragma must apply to one and
> only one grammar construct".  Possibly "at least one and no more than
> one grammar construct which is always the same grammar construct
> throughout an evaluation of the grammar if the evalutation considers
> that pragma".
> 
>>> Might it be agreed that the pragma scope needs to be identifiable?
>> 
>> Unfortunately, discussion about Requirements 9 and 10 in the last CG
>> meeting revealed some fervent objection to the idea that the
>> specification should say anything at all about pragma scope.
> 
> I should be curious as to which use cases are advanced where a pragma of
> no or arbitrary scope is useful.
> 
> [snip]
>> A pragma's scope tells us which grammar construct provides
>> its context.
>> 
>> Seen from a different perspective, scope is a matter of where the
>> pragma belongs in the grammar's hierarchy. A pragma essentially has a
>> parent, which provides its scope.
> 
> I think that makes sense; as a user, I would like to know what parts of
> the grammar to which the pragma will apply, and (presumably) an
> implementor necessarily has the same concern. Which means you have to
> (in some informal sense) bind a pragma to a grammar construct for a
> pragma to be useful.
> 
> This does not allow the "a pragma is a comment" identity, though; or at
> least, I think comments CAN hover between two rules. 
> 
> Which would require that the pragma syntax is unique and specified;
> implementers would not be free to tell you that some specific string in
> a comment is magic. (Since I can't think of any language which defines
> pragmas without giving them a specific syntax this does not strike me as
> alarming.)
> 
> -- Graydon
> 
> --  
> Graydon Saunders  | graydonish@fastmail.com
> Þæs oferéode, ðisses swá mæg.
> -- Deor  ("That passed, so may this.")
>
Received on Wednesday, 12 February 2025 16:55:38 UTC