Re: Thoughts on pragmas and iXML

On Fri, Feb 14, 2025 at 06:32:45PM +0000, Bethan Tovey-Walsh scripsit:
> > I would contend that the syntax of pragmas, on the other hand, needs
> > to allow someone reading a grammar to know which pragmas pertain to
> > which grammar constructs.
> 
> I agree, but I don't think this extends to knowing which pragmas
> *actually have an effect on the processing of* a given construct. It
> just means knowing which construct the pragma is attached to, and
> which construct should therefore provide the context for that pragma's
> semantics to be interpreted.

If I am unable to know which pragmas actually have an effect, and I'm
looking at the serialized output, I'm looking at a debugging process
that starts with editing all the pragmas out of the grammar and putting
them back one at a time. (Where order might be important…)

Existing ixml is challenging in terms of debugging. I would like to see
some concern for pragmas not adding further challenges to an already
difficult process. A pragma requirement to the effect that any processor
which supports pragmas must be able to produce a version of the input
grammar containing only those pragmas which will be processed would be
welcome. (And yes it would need some language about processor options.)

There's a frequent example of using a pragma to substitute in a regular
expression; given the requirement that a pragma not alter the parse tree
resulting from the grammar, this will require an equivalent regular
expression. While in some cases this ought to be straightforward ([Lu]
being trivially equivalent to \p{Lu}) it doesn't seem like it's going to
stay trivially equivalent. If I want to use something like the regular
expression

(\*|_)(.+?)\1

in a grammar via pragma substitution for the RHS, I have to write the
grammar RHS first, and I think that's going to be something like

("_", ~[_]+, "_" | "*", ~[*]+, "*").

but I'd be lost if I had to prove it, or if the processor doesn't like
the substitution. And I'd be in trouble if the grammar version was

(-"_", ~[_]+, -"_" | -"*", ~[*]+, -"*").

(as seems reasonable; drop the delimiters) because in regular expression
terms that's not match, that's replace, and ixml is doing extract more
than it's doing match and these are not conceptually identical.

This makes me want there to be a pragma requirement that makes it
possible to know if the thing going wrong is in the pragma or in the
grammar; perhaps something as simple as "if a processor supports
pragmas, it must have an option to run without pragmas" is the
sufficiently general choice there.  But I do want to stress as much as I
can that the main problem with using ixml is debugging and that the
general trend with pragmas gives me the impression that pragmas are
going to make the difficult worse.

That may be inevitable; greater complexity makes debugging harder. But
it doesn't seem like a good idea to accept stacking barriers to adoption
higher than they are, either.

-- Graydon

--  
Graydon Saunders  | graydonish@fastmail.com
Þæs oferéode, ðisses swá mæg.
-- Deor  ("That passed, so may this.")

Received on Monday, 17 February 2025 17:41:44 UTC