- From: Graydon <graydonish@gmail.com>
- Date: Mon, 17 Feb 2025 12:41:38 -0500
- To: Bethan Tovey-Walsh <bytheway@linguacelta.com>
- Cc: ixml <public-ixml@w3.org>
On Fri, Feb 14, 2025 at 06:32:45PM +0000, Bethan Tovey-Walsh scripsit: > > I would contend that the syntax of pragmas, on the other hand, needs > > to allow someone reading a grammar to know which pragmas pertain to > > which grammar constructs. > > I agree, but I don't think this extends to knowing which pragmas > *actually have an effect on the processing of* a given construct. It > just means knowing which construct the pragma is attached to, and > which construct should therefore provide the context for that pragma's > semantics to be interpreted. If I am unable to know which pragmas actually have an effect, and I'm looking at the serialized output, I'm looking at a debugging process that starts with editing all the pragmas out of the grammar and putting them back one at a time. (Where order might be important…) Existing ixml is challenging in terms of debugging. I would like to see some concern for pragmas not adding further challenges to an already difficult process. A pragma requirement to the effect that any processor which supports pragmas must be able to produce a version of the input grammar containing only those pragmas which will be processed would be welcome. (And yes it would need some language about processor options.) There's a frequent example of using a pragma to substitute in a regular expression; given the requirement that a pragma not alter the parse tree resulting from the grammar, this will require an equivalent regular expression. While in some cases this ought to be straightforward ([Lu] being trivially equivalent to \p{Lu}) it doesn't seem like it's going to stay trivially equivalent. If I want to use something like the regular expression (\*|_)(.+?)\1 in a grammar via pragma substitution for the RHS, I have to write the grammar RHS first, and I think that's going to be something like ("_", ~[_]+, "_" | "*", ~[*]+, "*"). but I'd be lost if I had to prove it, or if the processor doesn't like the substitution. And I'd be in trouble if the grammar version was (-"_", ~[_]+, -"_" | -"*", ~[*]+, -"*"). (as seems reasonable; drop the delimiters) because in regular expression terms that's not match, that's replace, and ixml is doing extract more than it's doing match and these are not conceptually identical. This makes me want there to be a pragma requirement that makes it possible to know if the thing going wrong is in the pragma or in the grammar; perhaps something as simple as "if a processor supports pragmas, it must have an option to run without pragmas" is the sufficiently general choice there. But I do want to stress as much as I can that the main problem with using ixml is debugging and that the general trend with pragmas gives me the impression that pragmas are going to make the difficult worse. That may be inevitable; greater complexity makes debugging harder. But it doesn't seem like a good idea to accept stacking barriers to adoption higher than they are, either. -- Graydon -- Graydon Saunders | graydonish@fastmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")
Received on Monday, 17 February 2025 17:41:44 UTC