what are pragmas for? from C. M. Sperberg-McQueen on 2022-01-25 (public-ixml@w3.org from January 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 25 Jan 2022 11:17:26 -0700
To: public-ixml@w3.org
Message-ID: <87v8y7fznt.fsf@blackmesatech.com>
On this morning's call it was proposed to spend some time discussing by
email the question "what are pragmas for?"  I expect this may be tricky,
because I have the impression that when the topic has come up in the
past, there has been general agreement on some formulation like "pragmas
are a way of talking to a processor in ways not constrained by the
spec", but that agreement does not appear to have run very deep.

Speaking quite generally, I think the role of pragmas in ixml should
resemble the roles of processing instructions in XML, of pragmas in
standard C, of pragmas and annotations in XQuery, the use-when mechanism
in XSLT, and extension elements and attributes in XSLT.  And in
particular I think the specification of pragmas in the XQuery spec is
quite good.

In our work, Tom and I formulated the purpose of pragmas thus:

    The general idea of pragmas is to provide a channel for information
    that is not a required part of the ixml specification but can be
    used by some implementations to provide useful behavior, without
    interfering with the operation of other implementations for which
    the information is irrelevant. ...

    On this view, pragmas are a form of annotation, and we use the terms
    pragma and annotation accordingly.

It is always a bit tricky to design mechanisms for non-standard
information, because if specific use cases are described the discussion
risks running down the rabbit hole of how a particular use case should
be solved, and the designers risk over-fitting the design to the
particular use case they had in mind.  In general there won't be
consensus on how to solve the use case or whether it's worth
solving. (If there were consensus on both, why would the mechanism not
be part of the spec?)  And if people steer completely away from use
cases, the mechanisms intended to support non-standard communication
will tend to be full of hand-waving and won't always work out well in
practice.

Some time ago I came to believe that the best available approach is to
try to think of at least two or three quite different use cases and
ensure that the design is general enough to handle them all and specific
enough that you can in fact sketch out how someone might address each
use case using the mechanisms provided.

One of the most obvious motivations for nonstandard communication with a
processor is to support functionality not present in the spec.  So while
pragmas are not the same as extension mechanisms, I think pragmas may
often be used to extend the base language.  And among the pragmas use
cases Tom and I identified, several involve adding functionality to
ixml.  Pragmas aren't necessarily the best way to add functionality to
ixml or to any spec, but once the spec is out, the choices are either
single-handed extensions to the spec, possibly with incompatible syntax
changes, or pragmas.

The use cases Tom and I identified and discussed are:

  - name indirection / dynamic names, so a processor can take the name
    of an element or attribute from the input string and not from the
    grammar

  - rule rewriting, so a processor can parse the input string with a
    grammar related to but not identical to the grammar specified (as in
    John Lumley's use case)

  - text injection, so a processor can inject text or attribute values
    into the XML representation of the input

  - attribute grammar specification, so a processor can use ixml grammar
    notation for the base context-free grammar in an attribute-grammar
    system

  - adding namespace support to ixml, so a processor can produce XML
    output with elements in identified namespaces

Some other use cases, which I have handled using extension elements and
extension attributes in the XML form of grammars, include

  - recording whether a given nonterminal is known to recognize a
    regular language

  - recording whether a given nonterminal is recursive or not

  - recording whether a given nonterminal is satisfiable or not (=
    whether the language it recognizes has any sentences or not)

  - when processing an ixml grammar G and creating a new related ixml
    grammar G', recording information about the nonterminals or rules in
    G from which a given construct in G' is derived

It does not seem surprising to me that almost all of these use cases can
be understood as annotations of particular constructs in an ixml
grammar: if a pragma is a way for a grammar writer to communicate with a
processor in non-standard way, then it may be worthwhile to ask "what
would a grammar writer have to *say* to a processor?  What would they
want to talk about?"  The one relatively certain overlap in interests
between a grammar writer and a processor is grammars, and in particular
the grammar in which the pragma is embedded, and the processing of that
grammar or input being parsed against that grammar.

Some requirements or desiderata seem to me to follow from the idea that
pragmas provide a channel for non-standard information that software may
act upon.

  - Because pragmas are for communication with software, it is of course
    helpful if the definition of a given pragma can assign structure to
    the pragma; that is one way in which pragmas appear to me to differ
    from comments in general, which are not necessarily intended for
    automated processing and thus need not always have a clear internal
    structure.  That is not to say that pragmas are not useful for the
    human reader as well.

    This is why Tom and I propose that the pragma element in a vxml
    grammar be allowed to contain XML elements in addition to the
    pragma-data element.

  - Because pragmas are for non-standard communication, not every
    processor will understand a given pragma.  It is thus either
    desirable or essential to have some way for processors to decide
    reliably whether a given pragma is one they understand and can /
    should act on or not.

    This is why Tom and I thought that a realistic pragmas proposal
    requires some way to use QNames and thus to bind prefixes to
    namespaces.

  - Because pragmas will often involve annotation of specific
    grammatical constructs, it is desirable that it be relatively
    straightforward to ensure that a given pragma gets attached to the
    correct construct.  I think a reasonable operational interpretation
    of this desideratum is: ensure that the pragma element appears in
    the XML as the child of the appropriate element.

    This is why Tom and I devoted some considerable time to adjusting
    the grammar to allow pragmas to be used where our other annotation
    constructs (namely mark and tmark) are used.


In identifying these desiderata I am strictly speaking going beyond the
question "what are pragmas for?", but it seems clear that we will not
always understand each other's answers to that question if we don't talk
about the implications of the answers.  I identify the desiderata above
not because I want to make them the topic of conversation but because I
want them to shed light on what I understand it to mean when we say
"pragmas are a way of talking to a processor in ways not constrained by
the spec".

Well, that's my two cents to start.

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Tuesday, 25 January 2022 18:17:47 UTC