Re: Thoughts on pragmas and iXML from Bethan Tovey-Walsh on 2025-02-19 (public-ixml@w3.org from February 2025)

From: Bethan Tovey-Walsh <bytheway@linguacelta.com>
Date: Wed, 19 Feb 2025 14:18:43 +0000
To: Graydon Saunders <graydonish@gmail.com>
Cc: ixml <public-ixml@w3.org>
Message-Id: <A0A86D9F-5E32-43B3-BBD3-49E2788692F0@linguacelta.com>
> If the effects of a pragma depend on the data, knowing what pragma
> applied depends on their only being one of that name because the data
> (and thus outcome) can be arbitrarily different and we don't want to
> constrain the data.

The data won't be arbitrarily different for the same pragma, though. A pragma will have its own internal logic. It should be no harder to understand what using the same pragma twice means, than to understand what using two different pragmas means. Your rule wouldn't stop anyone using two pragmas with identical semantics, provided they had different names. If you couldn't tell which of 

{[pragma_a data_1]}{[pragma_a data_2]} 

applied, then you also couldn't tell which of

{[pragma_a data_1]}{[pragma_b data_2]} 

applied, given two pragmas with the same semantics. So the restriction isn't actually a restriction at all. If someone wants to do it, there's an easy workaround. If no-one wants to do it, then we don't need to legislate against doing it in the first place.

> I'd prefer default-suppressed because that's less editing of the grammar
> for debugging purposes. (And typing is bugs.)

Either way, you'll have the option to turn them off or on. I don't think this would mean control over every single pragma independently of the others, though, so you'd still need to edit them out for selective debugging, or find a processor whose author is willing to give you more granular controls.

> Which is roughly "the more specific a pragma is, the less priority it
> has"; how an implementer achieves that isn't (and I think shouldn't be)
> specified.

The problem with this is that there's no constraint on where the pragma has its effect, only on where it appears in the hierarchical structure generated by parsing. The consensus in CG meetings has been strongly against saying anything about where the effect of a pragma may apply. So a pragma at the root element might notionally have some effect on a descendant many levels down in the hierarchy. A pragma on a child might have some effect on its parent or on a more distant ancestor. So the definition of "more specific" would be purely syntactic: A pragma that is attached to a parent has its effect before a pragma that is attached to a child, even if the child-pragma affects its own grandparent, and the parent-pragma affects its own great-grandchild.

As Norm's already said, I think it's possible to start imagining monstrous grammars with pragmas like Hydra heads. But that's just really unlikely. Forging a mythical sword to defeat the Hydra is reasonable if there's a strong chance of Hydras in your immediate area; otherwise, it's a massive amount of effort which takes resources from other, more pressing tasks, and the sword just ends up being used to chop vegetables (and lops off the odd finger in the process).

BTW

___________________________________________________ 
Dr. Bethan Tovey-Walsh 

linguacelta.com

Golygydd | Editor geirfan.cymru

Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 18 Feb 2025, at 20:52, Graydon <graydonish@gmail.com> wrote:
> 
> On Tue, Feb 18, 2025 at 07:48:07PM +0000, Bethan Tovey-Walsh scripsit:
> [snip]
>>> - when a pragma pertains to a grammar component, at most one pragma
>>> of that name pertains to that component.
>> 
>> Not entirely sure about this one, to be honest. I can't immediately
>> see much harm in it, but it smells like over-constraint to me.
> 
> If the effects of a pragma depend on the data, knowing what pragma
> applied depends on their only being one of that name because the data
> (and thus outcome) can be arbitrarily different and we don't want to
> constrain the data.
> 
>>> by default, no pragma is applied when a grammar is processed. A
>>> processor must require some positive action to process pragmas.
>> 
>> We're talking about requiring processors to have some kind of
>> "suppress-pragmas" option. Maybe that could be reversed, so that
>> suppress-pragmas is default, and include-pragmas requires an active
>> choice. Worth discussing once we have a design on the table.
> 
> I'd prefer default-suppressed because that's less editing of the grammar
> for debugging purposes. (And typing is bugs.)
> 
>>> in the same conceptual way that the sequence constructor in
>>> xsl:for-each happens simultaneously for every member of the input
>>> sequence, the entire parse tree is constructed before any pragma is
>>> applied. When pragmas are applied to the parse tree, those pragmas
>>> which pertain to a node in the parse tree because they pertained to
>>> the component of the grammar which constructed that node of the
>>> parse tree are applied in least-ancestral order with respect to
>>> that node's position in the parse tree.
>> 
>> Okay, this bit is the really problematic bit, for me
>> 
>> There are absolutely no constraints on the form taken by the input
>> grammar once it's been parsed.
> 
> I agree about the input grammar entirely.
> 
> When I said "parse tree" there, I intended to mean the parse tree of the
> XML result of the grammar; it can't be the final result because pragmas
> may do things between the parse tree and the final serialized result.
> 
>> More importantly, there's no telling exactly when a processor will
>> need to apply a pragma. Depending on how the particular processor
>> works, there may be pragmas which have an effect on some tokenization
>> step, or on a pre-processing of the grammar or the input, or on the
>> parse itself, or on pruning the parse forest, or on serialization.
>> Some of these must occur before the parse tree is even constructed. If
>> we insist that pragmas must be processed in least-ancestral order, we
>> require grammar authors to know exactly when a pragma will be acted
>> upon by the processor, in order to write them in the right order in
>> the grammar.
> 
> What I'm trying to mean here is that in the same way I don't know in
> what order an XSLT transform does anything, but I _do_ know that what I
> get back will be ordered on the basis of the document order of the
> source document nodes which produced nodes in the result document, when
> I get the XML result of processing the text input by the combination of
> grammar and processor, the pragmas which pertain will appear to have
> been applied in least-ancestral-first order to any XML node to which a
> pragma pertained because it pertained to a construct in the grammar.
> 
> Which is roughly "the more specific a pragma is, the less priority it
> has"; how an implementer achieves that isn't (and I think shouldn't be)
> specified.
> 
> Like the XSLT case, this isn't intended to mean the processor had to do
> it in that order, only that the result looks like things happened in
> that order.
> 
> Grammar authors only need to know which pragmas pertain to what
> constructs, which was already so by the syntax rules, same as someone
> writing XSLT doesn't need to know the order of execution for templates.
> 
> And the intended point to it is that you can have some idea what
> happened to that XML node in your result in terms of the pragmas which
> pertained to it, NOT that you know the actual order of execution. (Nor
> do you know that every pragma was applied; if a grammar-scope pragma
> would have overwritten every more-narrowly scoped pragma, an
> implementation could properly apply only the grammar-scoped pragma.)
> 
> -- Graydon
> 
> 
> --  
> Graydon Saunders  | graydonish@fastmail.com
> Þæs oferéode, ðisses swá mæg.
> -- Deor  ("That passed, so may this.")
>
Received on Wednesday, 19 February 2025 14:19:02 UTC