some questions about pragmas, and the arguments as I understand them from C. M. Sperberg-McQueen on 2022-02-02 (public-ixml@w3.org from February 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Wed, 02 Feb 2022 12:18:20 -0700
To: public-ixml@w3.org
Message-ID: <87v8xxoz5v.fsf@blackmesatech.com>
During the group's discussions of pragmas, I confess that I have had
some trouble understanding the arguments people are trying to bring to
bear on the issues.  Sometimes, I have been told, the best way to
understand the arguments on the other side of an issue better is to
appoint oneself an advocate for the other side and formulate those
arguments oneself.  So I am going to try that approach.

Of course, any provision for pragmas involves a number of different
questions, on each of which different arguments may bear.

What follows is an attempt to identify a number of distinct questions
relating to the inclusion of pragmas in our spec and for each question
to identify and summarize in a neutral way the arguments on each side
of the question.  Since an important goal of this exercise is to
understand the arguments on each side of each question, I hope that
members of the group will identify places where the lists of arguments
are incomplete or misleading or badly phrased.

I should define some terminology:

  - By 'out-of-band information' I mean information that does not
    affect the standard interpretation of the relevant ixml grammar.

    In the nature of things, many plausible examples will involve
    out-of-band information that cannot be expressed using our
    grammatical formalism, which is at least approximately the same as
    information for which the ixml spec provides no standardized
    representation.

    Examples already brought forward include requests not to mark
    sentences as ambiguous or to optimize the parsing process in some
    way that does not affect the parsing results but may affect
    resource requirements.  But also in the nature of things, any
    mechanism we provide for out-of-band information can be used by a
    user to include information that could have been expressed
    directly in the ixml grammar but was not.  And going even further,
    such a mechanism can easily be used to include information that
    *is* expressed in the grammar; that would be redundant, but very
    hard to prohibit.

    So any characterization, later on, of information as
    'extra-grammatical' or 'non-standardized' should be taken with a
    grain of salt as a coarse approximation.

    Also in the nature of things, the goals implementors and users
    might seek to achieve by the use of out-of-band information are
    difficult to bound.  Obvious possibilities include varying this or
    that aspect of a processor's operation, overriding defaults, or
    specifying behavior that is not the usual behavior for the
    processor.

  - By 'inline' information I mean information conveyed within the
    sequence of characters submitted to an ixml processor as the ixml
    input grammar.

    Other information available to a processor includes information
    conveyed in the input string and information conveyed by
    invocation-time parameters or options.

  - By information 'directed to a processor' I mean information that a
    processor can successfully parse and process.  The term is not
    intended to suggest and should not be taken to mean that the
    information cannot also be read, understood, and acted on by a
    human being.

  - The 'SM' proposal is the 'straw man' proposal on pragmas put
    forward by Steven Pemberton in response to the TM proposal.

  - The 'TM' proposal is that put forward by Tomos Hillman and me.

  - A 'pragma' is inline out-of-band information directed to a
    processor.

Question 1: should ixml provide for, or allow for, out-of-band
information?

    Note: I do not believe there is currently disagreement on this
    point.
    
    Arguments con: If all out-of-band information were forbidden it
    would be easier to guarantee interoperability.

    Arguments pro: It would be dauntingly difficult to forbid it.
    
    Unless we prohibit the definition of invocation options for a
    processor, the processor will have access to out-of-band
    information.  And even then, it would be hard to guarantee that
    processors have no access to out-of-band information.
    
Question 2: should ixml provide for, or allow for, inline out-of-band
information?

    Note: I do not believe there is currently disagreement on this
    point.
    
    Arguments con: If all out-of-band information were forbidden it
    would be easier to guarantee interoperability.

    Arguments pro: It would be dauntingly difficult to forbid it.
    
    Unless we rewrite the grammar of ixml to prohibit comments, the
    processor will have access to inline out-of-band information.  And
    if we did rewrite the grammar of ixml to prohibit comments,
    steganographic techniques could be used to embed inline
    out-of-band information in an ixml input grammar (using the number
    and identity of whitespace characters to encode information, for
    example).
    
Question 3: should ixml provide for, or allow for, inline out-of-band
information directed to processors ?

    Note: I do not believe there is currently disagreement on this
    point.
    
    Arguments con: If all out-of-band information were forbidden it
    would be easier to guarantee interoperability.

    Arguments pro: It would be dauntingly difficult to forbid it.
    
    Unless we rewrite the grammar of ixml to prohibit comments, users
    can use comments to embed out-of-band information directed to a
    processor in an ixml input grammar.  If we do forbid comments, users
    can use other means to do so.
    
Question 4: should ixml provide distinct constructs for inline
out-of-band information directed to processors and other inline
out-of-band information?

    Note: It is not clear whether there is currently disagreement on
    this point.

    Both the TM proposal and the SM proposal provide distinct
    constructs, with nonterminals named 'pragma' and 'comment'.  But it
    is not clear that those proposal represent the full range of current
    opinion in the group.

    The natural interpretation of each proposal is that the 'pragma'
    construct is intended (as the name suggests) for inline
    out-of-band information directed to a processor, and the 'comment'
    construct is intended for other inline out-of-band information.

    In the nature of things, there is nothing that can prevent a user
    from using either construct in the 'other' way.  The implicit
    assumption appears to be that the interoperability advantages of
    writing pragmas using the 'pragma' nonterminal and the relative
    pointlessness of writing other inline out-of-band information that
    way will suffice to ensure that at a first approximation pragmas are
    written using the 'pragma' construct and other inline out-of-band
    information using the 'comment' construct. As far as I know, no
    relevant arguments on any side of any question rely on the
    correlation being perfect.

    Arguments pro:

    . Providing distinct constructs allows those who believe pragmas
    and comments are usefully distinguished to do so and thus makes
    ixml grammars clearer and easier to understand.

    . Providing distinct constructs reduces the probability that a
    comment intended only as a human-readable observation (e.g. the
    comment {!} to mark a grammatical rule that the author thinks the
    reader might not have expected to see) is misinterpreted by a
    processor as a pragma (e.g. as a request to behave in a particular
    way); it thus improves the likelihood that a grammar that works
    successfully with one processor will also work successfully with
    others.
    
    Arguments contra:

    . Providing a defined construct for implementation-defined
    behavior, or even for behavior to be defined by some future
    version of a specification, serves as a signal to implementors
    that it is acceptable and perhaps even expected that they may or
    should provide non-standard behaviors that can be invoked that
    way.  It thus reduces the likelihood that a grammar that works
    successfully with one processor will produce the same results with
    others.

    . Providing a defined construct for implementation-defined
    behavior allows (and may encourage) implementations to use that
    construct to specify new or extended behavior, even in cases where
    the behavior should (on the grounds of technical soundness and
    quality of design) be provided by different syntax and be built
    into the base specification.

    If (for example) a programming language provides no type system
    but does provide a pragma construct, it would be a design error
    for compilers to provide type checking by means of pragmas: the
    type system should be built into the language, not added by means
    of pragmas.  

    If (for example) a styling provides no styling property to specify
    a particular kind of text rendering, but does provide for
    implementation-defined styling properties, it would be a design
    error for renderers to provide control over that property by using
    an implementation-defined property: the property should be built
    into the language, not added by means of implementation-defined
    properties.

Question 5: If ixml provides distinct nonterminal for 'pragma' and
'comment' (for pragmas and other inline out-of-band information,
respectively), how should the two nonterminals be related,
grammatically?

    (a) The set of strings generated by 'pragma' should be a subset of
    those generated by 'comment'.

    This makes the statement 'pragmas are comments' true by
    grammatical construction.

    (b) The set of strings generated by 'pragma' and the set generated
    by 'comment' should be disjoint, but the delimiters should be
    chosen so as to be visually similar and convey an underlying
    affinity between the two constructs.

    This makes the statement 'pragmas are comments' not true as
    regards the grammatical constructs, but apposite as a metaphor.

    (c) The set of strings generated by 'pragma' and the set generated
    by 'comment' should be disjoint, and the delimiters should be
    distinct, so as to convey that the two constructs are distinct.


    Note: the choice among (a), (b), and (c) appears to depend in part
    on whether 'comment' is taken as a name for 'inline out-of-band
    information' or for 'inline out-of-band information other than
    pragmas'.
    

    Note: It is not clear whether there is currently disagreement on
    this point.

    The SM proposal follows path (b).

    The TM proposal uses distinct delimiters for comments and pragmas,
    but the authors of the proposal have indicated that they would be
    willing to accept the delimiter pairs '{[' ... ']}' or '{|'
    ... '|}' and '⦃' ... '⦄' for pragmas, while retaining '{' ... '}'
    for comments (assuming appropriate grammatical adjustments to
    prevent ambiguity).


    Arguments for (a)
    
    - If by nature pragmas are comments, then the grammar should
      reflect that fact.

    Arguments for (c)
    
    - If by nature pragmas and comments are distinct objects, then the
      grammar should reflect that fact.

    Arguments for (b)
    
    - If some members of the group feel strongly that by nature
      pragmas are comments, while others feel that pragmas and
      comments are distinct objects and neither is a subset of the
      other, then the grammar cannot fully satisfy both views.  But if
      the similarity of the two constructs can be captured by a
      similarity of delimiters, and the distinctness of the two
      constructs can be captured by making them distinct
      grammatically, then holders of each view may find the spec
      workable.


A number of other questions arise, relating to the internal syntax and
semantics of pragmas and relating to their place in the larger syntax
of ixml, but this mail is already long enough that I expect some
readers will be prepared to accuse the author over-thinking things.
So I will stop here.  If we can understand where members of the group
are on the questions identified above, and the arguments that lead
them to their positions, I think it might conduce to progress.

So I repeat my request that members of the group help other members of
the group understand their positions by correcting the formulation of
what they recognize as their arguments, or by providing formulations
for arguments they believe are relevant but missing.

If I have distorted or omitted any argument any member of the group has
brought forward, you may reliably take it as an indication that you did
not make it clearly enough for me to understand and remember it; please
make it again!

(Note, however, that I have attempted to phrase arguments in a neutral
tone, so if the only acceptable formulation of your views begins with
"it is obvious that ...", you may be disappointed by my paraphrase.  But
the point of the exercise is to formulate the arguments in a way that
lets people understand them even if they disagree with them; phrases
that rhetorically demand assent are counter-productive.  Meta-arguments
of the form "X outweighs Y" are also unhelpful; if X and Y are the
relevant arguments, and X weighs for your position, then it is already
evident which argument you find weightier.)

I hope this helps.

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Wednesday, 2 February 2022 19:18:44 UTC