Re: Thoughts on pragmas and iXML from Bethan Tovey-Walsh on 2025-02-17 (public-ixml@w3.org from February 2025)

From: Bethan Tovey-Walsh <bytheway@linguacelta.com>
Date: Mon, 17 Feb 2025 21:00:38 +0000
To: Graydon Saunders <graydonish@gmail.com>
Cc: ixml <public-ixml@w3.org>
Message-Id: <C87D54EA-B948-466B-B1C6-D6DBF1A6A207@linguacelta.com>
> If I am unable to know which pragmas actually have an effect, 

Just because there are no overriding syntactic rules, that doesn't mean you cannot know what the semantic rules are, and how they interact. You will need to know how the pragma functions, and part of that will involve knowing how pragmas interact. If you have a "replace this element with a giraffe" pragma and a "avoid replacing this element with a giraffe at all costs" pragma, you can't use them together. That isn't a syntactic constraint; it's a semantic one. Similarly, if you have "replace this pragma with the immediately following pragma", you have to add a second pragma to follow it. And if you have "process this pragma, then ignore all other pragmas on the same element", you will know that any subsequent pragmas cannot have any effect. If there's a pragma in your grammar, you need to know what it does. That includes knowing what it does to any other pragmas around it, just like any other grammar constructs. It's all about becoming familiar with the semantics of each pragma.

Now, we're talking as though there are going to be hundreds of pragmas, all vying for attention. In reality, I think that's very unlikely. So working out how the pragmas will interact with each other isn't likely to be terribly complicated. 

But I absolutely don't accept that you will ever be "unable to know which pragmas actually have an effect". You won't know simply from the pragma's location in the grammar; you will know because you won't use a pragma without having understood its semantics.

> I'm looking at a debugging process
> that starts with editing all the pragmas out of the grammar and putting
> them back one at a time.

I think this is a good argument for requiring processors to have a "run without pragmas" mode, as you suggest later. I think we discussed that idea in a CG meeting, in relation to Requirement 3 ("Support for pragmas must be optional."). It isn't really a requirement for designing pragmas into the language, exactly - more a matter of the prose we add to the spec. So it's probably best dealt with further down the line, when we actually have a proposal and are drafting the supporting prose.

> A pragma requirement to the effect that any processor
> which supports pragmas must be able to produce a version of the input
> grammar containing only those pragmas which will be processed would be
> welcome. 

I agree, but I think it's a quality-of-implementation issue. We're still fairly early in the life of iXML, so tools for debugging and so on are not particularly mature yet. I think getting into exactly what options should be offered by implementations is probably not ideal unless there's some overwhelming reason to do so. But there will be little point in supporting pragmas if they are impossible to use, so I find it unlikely that implementers will neglect to add strategies to make life easier, over time.

> because in regular expression
> terms that's not match, that's replace, and ixml is doing extract more
> than it's doing match and these are not conceptually identical.

I don't really imagine that a regex pragma would do anything other than specify the match pattern. In some ways, it would be like an inline tokenization, telling the processor "just skip over the next twenty characters while you're parsing, because I know they match this nonterminal". I wouldn't expect it to express which bits are to be suppressed etc. Whether there would be any processing benefit in substituting a regex in the parse would depend entirely on the grammar fragment for which the regex is proposed, and including substitutions in the iXML representation might have some effect on that. But that's a matter for the implementer to deal with.

And, indeed, you'd want to be sure that your regex recognized the same language as your grammar fragment, so it would probably be a good pragma for things that are easy to match, but where parsing character by character is rather expensive. 

> But it doesn't seem like a good idea to accept stacking barriers to adoption
> higher than they are, either.


I think it's very important to remember that using pragmas in your grammars will be entirely optional. If you find that they complicate your workflow, there's absolutely no reason you should use them. They will only add complexities for implementers, and for authors who choose to use them.

BTW

___________________________________________________ 
Dr. Bethan Tovey-Walsh 

linguacelta.com

Golygydd | Editor geirfan.cymru

Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 17 Feb 2025, at 17:41, Graydon <graydonish@gmail.com> wrote:
> 
> On Fri, Feb 14, 2025 at 06:32:45PM +0000, Bethan Tovey-Walsh scripsit:
>>> I would contend that the syntax of pragmas, on the other hand, needs
>>> to allow someone reading a grammar to know which pragmas pertain to
>>> which grammar constructs.
>> 
>> I agree, but I don't think this extends to knowing which pragmas
>> *actually have an effect on the processing of* a given construct. It
>> just means knowing which construct the pragma is attached to, and
>> which construct should therefore provide the context for that pragma's
>> semantics to be interpreted.
> 
> If I am unable to know which pragmas actually have an effect, and I'm
> looking at the serialized output, I'm looking at a debugging process
> that starts with editing all the pragmas out of the grammar and putting
> them back one at a time. (Where order might be important…)
> 
> Existing ixml is challenging in terms of debugging. I would like to see
> some concern for pragmas not adding further challenges to an already
> difficult process. A pragma requirement to the effect that any processor
> which supports pragmas must be able to produce a version of the input
> grammar containing only those pragmas which will be processed would be
> welcome. (And yes it would need some language about processor options.)
> 
> There's a frequent example of using a pragma to substitute in a regular
> expression; given the requirement that a pragma not alter the parse tree
> resulting from the grammar, this will require an equivalent regular
> expression. While in some cases this ought to be straightforward ([Lu]
> being trivially equivalent to \p{Lu}) it doesn't seem like it's going to
> stay trivially equivalent. If I want to use something like the regular
> expression
> 
> (\*|_)(.+?)\1
> 
> in a grammar via pragma substitution for the RHS, I have to write the
> grammar RHS first, and I think that's going to be something like
> 
> ("_", ~[_]+, "_" | "*", ~[*]+, "*").
> 
> but I'd be lost if I had to prove it, or if the processor doesn't like
> the substitution. And I'd be in trouble if the grammar version was
> 
> (-"_", ~[_]+, -"_" | -"*", ~[*]+, -"*").
> 
> (as seems reasonable; drop the delimiters) because in regular expression
> terms that's not match, that's replace, and ixml is doing extract more
> than it's doing match and these are not conceptually identical.
> 
> This makes me want there to be a pragma requirement that makes it
> possible to know if the thing going wrong is in the pragma or in the
> grammar; perhaps something as simple as "if a processor supports
> pragmas, it must have an option to run without pragmas" is the
> sufficiently general choice there.  But I do want to stress as much as I
> can that the main problem with using ixml is debugging and that the
> general trend with pragmas gives me the impression that pragmas are
> going to make the difficult worse.
> 
> That may be inevitable; greater complexity makes debugging harder. But
> it doesn't seem like a good idea to accept stacking barriers to adoption
> higher than they are, either.
> 
> -- Graydon
> 
> --  
> Graydon Saunders  | graydonish@fastmail.com
> Þæs oferéode, ðisses swá mæg.
> -- Deor  ("That passed, so may this.")
>
Received on Monday, 17 February 2025 21:01:03 UTC