Re: Thoughts on pragmas and iXML

It is rash of me to wade in to this discussion, but let's see how far I
get before the waters close over me.

On Wed, Feb 05, 2025 at 02:03:31PM +0000, Bethan Tovey-Walsh scripsit:
> 1. The "foundational rules", which were unanimously accepted as being
> of primary importance by the CG - i.e. the first two requirements:
> pragmas must not affect the syntactic validity, or change the
> semantics, of the base grammar. 

I have formed a vague impression that one of the functional constraints
on pragmas is a preference for whatever pragma syntax not making it
difficult to parse an ixml grammar with ixml.

From a practical perspective, pragmas as an idea are a way to provide
information outside the scope of the language. In the ixml case,
something that comes to mind immediately is serialization of the output.
That isn't something that fits in the grammar but it is something a user
would care about.

So we have to be able to identify which pragma, and bind that pragma to
a scope (I think, effectively "result", "non-terminal", "terminal"). It
also needs to be possible for an implementer to do what they like; one
might offer "you can have indented" for serialization and another might
offer the full set of options for `xsl:result-document`.

I'm inclined to think that the general trend of passing information
around as maps is a good one, and that a pragma ought to be expressed as
an XPath map which must have a `name` entry which must have a QName value
and which must have a `scope` entry which must have a value from a set
list (presumably 'result', 'non-terminal', 'terminal' or words to that
effect) and which may have any other XPath map entry it wants.

I'd further want to say that pragma definitions go at the top of the
ixml grammar file and 'result' scope pragmas don't appear anywhere else.
A non-terminal pragma reference appears to the left of the equal sign. A
terminal pragma reference appears to the left of the full stop. (Should
there be a space and nothing else between the pragma reference and the
equals sign or stop? Probably.) Pragma references are by name, so pragma
definitions are required to have unique names within a single grammar. What
names have meaning is up to the implementer.

Something vaguely like:

#%# { 'name': 'serialize',
      'scope': 'result',
      'indent': 'yes'
}

#%# { 'name': 'normal-form',
      'scope': 'non-terminal',
      'form': 'NFKD'
}

#%# { 'name': 'trouble',
      'scope': 'terminal'
}

...

fullQuote %#%normal-form = openQuote, quotedText, closeQuote.


NL = #A %#%trouble .


I don't know if a "done with pragmas" marker would be desirable; I am
quite sure the #%# and %#% will not be anyone else's first thought. But
I think the general pattern of "a pragma is defined by a map, the map
has requirements, the pragmas are defined at the top of the file,
sub-whole-result pragmas are used by reference" is both manageable for
an implementer and retains a syntax where it is relatively
straightforward to process an ixml grammar with ixml. (And can clearly
be deleted from the grammar document without affecting its utility as an
ixml grammar.)

How you get paramaterized pragmas -- Sometimes I want to normalize the
non-terminal as NFC, sometimes as NFKD -- would I think fit into the map
construction, and could plausibly be left up to the implementer except
perhaps for a decision about using either a lookup operator style

%#%normal-form?NFC

or parenthesis

%#%normal-form(NFC)

which would perhaps do a better job of supporting multiple params if
that's a place people want to go.


-- Graydon


--  
Graydon Saunders  | graydonish@fastmail.com
Þæs oferéode, ðisses swá mæg.
-- Deor  ("That passed, so may this.")

Received on Wednesday, 5 February 2025 16:35:49 UTC