Re: Thoughts on pragmas and iXML from Graydon on 2025-02-18 (public-ixml@w3.org from February 2025)

From: Graydon <graydonish@gmail.com>
Date: Tue, 18 Feb 2025 11:23:15 -0500
To: Bethan Tovey-Walsh <bytheway@linguacelta.com>
Cc: ixml <public-ixml@w3.org>
Message-ID: <Z7Sz8_vO5OLFUAcI@menja.localdomain>
On Tue, Feb 18, 2025 at 11:29:01AM +0000, Bethan Tovey-Walsh scripsit:
> I can't imagine why you'd choose to put pragmas in grammars used in
> that kind of codebase. In that situation, I'd be making 100% sure that
> the iXML processor used in production was strictly used in
> "no-pragmas" mode. If there's a requirement for software-agnostic
> code, in particular, you absolutely can't use pragmas. 

I agree with all of these points, very much.

> But people write bad code. It's an absolute pain in the behind when
> they do.

Entirely so, and sometimes I find that the person writing the bad code
was Past Me.

[snip]
> Finally, let me ask you this: if you want the spec to insist on rules
> for precedence, what do you suggest those rules should be? What
> principles should we use to select the best and most useful set of
> precedence rules?

I have absolutely no difficulty with the idea of needing to understand
what a pragma does; what I'm concerned about is the prospect of not
being able to tell which pragma I need to understand.

In positive language, I want to be able to find out which pragma I need
to understand while debugging the output of an iXML grammar.

To that end, I would propose:

- pragmas have names, and by analogy with processing instructions, all
  pragmas with a particular name which pertain to any component of a
  grammar are the same pragma; name is identity. The name mechanism
  requires the processor to be identified in the name and for this
  portion of the name to be enough to make all pragma names implemented
  in that processor distinct with respect to pragmas from any other
  processor. (minimally; "but what about commonly defined pragmas?"
  seems out of scope.)

- when a pragma pertains to a grammar component, at most one pragma of
  that name pertains to that component. (Again continuing the analogy to
  processing instructions, we know the name; we do not know anything
  else. So different instances of the same pragma with different data
  could do different things. And the intent is to know which pragma
  applies to what node in the result, so two of the same name and
  different data would introduce ambiguity.)

- by default, no pragma is applied when a grammar is processed. A
  processor must require some positive action to process pragmas. (These
  two sentences are thought of as being equivalent.)

- in the same conceptual way that the sequence constructor in
  xsl:for-each happens simultaneously for every member of the input
  sequence, the entire parse tree is constructed before any pragma is
  applied. When pragmas are applied to the parse tree, those pragmas
  which pertain to a node in the parse tree because they pertained to
  the component of the grammar which constructed that node of the parse
  tree are applied in least-ancestral order with respect to that node's
  position in the parse tree.

So no need to constrain the data, or at least I'm not perceiving one;
pragmas pertain to knowable components of the grammar and they're
applied in a known order. I think that's enough to make pragmas not the
worst thing with respect to debugging a grammar and I think that's all
that it is appropriate to ask.

I'd like it if a processor had an option to activate pragmas by testing
membership in a sequence of pragma names, but I don't think that's
anywhere near to being a requirement.

I acknowledge that there's a potential problem in that many grammar
components may construct some part of what becomes a single text node in
the result; my expectation is that convention (e.g. character
progression direction order) could deal with that, but I'm not even
approximately an implementer.


-- Graydon

--  
Graydon Saunders  | graydonish@fastmail.com
Þæs oferéode, ðisses swá mæg.
-- Deor  ("That passed, so may this.")
Received on Tuesday, 18 February 2025 16:23:22 UTC