two objections to the pragmas proposal

During the call earlier today, Steven Pemberton summarized his
objections to the pragmas proposal put forward by Tom Hillman and me in
three points, of which I unfortunately remember only two: it's too
complicated and it goes beyond its remit.  I'd like to address these two
issues, if I can do so usefully.

(1) On the issue of our remit, I think Tom and I have already answered
the objection.  We did not see a way to make pragmas work well without
some mechanism for distributed naming, so we faced the choice between
making a proposal for distributed naming part of the pragmas proposal or
not making a pragmas proposal.

Those who believe distributed naming is not necessary in order to
satisfy the requirements Tom and I identified are welcome to make their
case that it's not necessary, or that our requirements are too
stringent.  But so far I have not seen anyone making either of those
cases.

The proposal we made for distributed naming was to reuse the QName
mechanism now familiar to pretty much anyone who uses XML seriously,
including people who avoid namespaces wherever possible.  In principle,
as we said last week, any other mechanism would do as well.  Having
thought about it in the last week, I think SP's strawman proposal
persuaded me that QNames are really the only plausible solution in an
XML context, because they are familiar and well understood.  Any other
mechanism will elicit the question from users "why didn't you just use
QNames?"

I understand that not everyone in the group thought, when Tom and I took
an action to develop a pragmas proposal, that it would also entail a
proposal for namespaces or something like them.  But our remit was to
produce a workable proposal for pragmas; I think that any workable
pragmas proposal requires a workable proposal for QNames.  As I say,
anyone in or outside the group is welcome to explain why it doesn't.


(2) On the issue of complication, I would most of all like a bit more
specificity.  It's hard to answer so vague and sweeping an objection,
and I am reduced to guessing which parts of the proposal people think
are too complicated.

Judging SP's strawman proposal as a baseline level of complexity (and
using the names Tinman (TM) and Strawman (SM) for brevity, I think I see
some areas in which SM is simpler than TM, some in which it's more
complex, and a number of areas where the changes don't seem to make any
significant difference.

- Several ways in which SM differs from TM appear to me irrelevant to
  questions of complexity -- that is, they neither make things simpler
  nor make them more complicated.  Among these I would list:

    . The change in delimiters.
    . The prohibition on empty comments.
    . Allowing empty processor specifications.
    . Forbidding blanks but not newline, tab, or other whitespace
      characters within the processor specification.
    . Requiring blank and not allowing newline, tab, or any other other
      whitespace to separate the processor specification from the
      pragma body.
    . Defining the XML form of pragmas as having mixed content rather
      than element content.

  (These are all things I think of as weaknesses in SM, but none seems
  to be intended as a simplification.)  

- In SM, pragmas have a slightly more complicated internal structure
  than in TM, since processors are required to recognize comments and
  pragmas embedded in pragmas.

  (I think, by the way, that this is a design error and contradicts the
  principle that "The structure of body text of any pragma is defined by
  the processor it is addressed to."  A better design allows the pragma
  delimiters to occur within a pragma, but without requiring that when
  they are encountered they define a syntactically legal pragma.  And
  ditto for comments.  But that is not directly relevant to the question
  of complexity.)

- In SM, pragmas are allowed wherever whitespace and comments are
  allowed, which reduces complexity as measured by the number of changes
  to the grammar.

  I wonder if the other changes to the grammar in TM are what SP has in
  mind when he says it's "too complicated"; I suspect it is.

  On the other hand, as far as I can tell SM is more complex to use for
  the grammar writer, especially but not exclusively the grammar writer
  who cares about the XML form of the grammar.

  The reason for TM's design in this area is that in every use case
  anyone has reported for pragmas, the pragma can be understood as an
  annotation on a symbol in the right-hand side of a rule, on a rule, or
  on the grammar itself.  There may be other use cases which have
  different requirements, but so far no one has mentioned any.  So TM
  reflects an attempt to make the syntax of pragmas suitable in those
  three cases.

  The examples of XQuery and ixml itself illustrate that quite often an
  intuitive syntax for annotating any thing puts the annotation before
  the thing. In ixml, to annotate a nonterminal with a mark, we write
  the mark before the nonterminal; the syntax of annotations and pragmas
  in XQuery similarly puts the annotation or the pragma first.  Any
  discussion of attribute grammars will tend to illustrate an opposite
  tendency:  the attribute value assignment rules for any grammar
  production can be viewed as annotations on the rule, but invariably
  follow the rule rather than preceding it.

  TM allows annotations on a symbol to occur before it, before or after
  any mark.  So for a rule of the form

    a : ¿my:red? @b, ¿my:orange? ^c, ¿my:yellow? -d.

  or equivalently

    a : @ ¿my:red? b, ^ ¿my:orange? c, - ¿my:yellow? d.

  the XML form places the pragmas named my:* as children of the
  nonterminal elements:

    <rule name="a">
      <alt>
        <nonterminal mark="@" name="b">
          <pragma pname="my:red"/>
        </nonterminal>
        <nonterminal mark="^" name="c">
          <pragma pname="my:orange"/>
        </nonterminal>
        <nonterminal mark="-" name="d">
          <pragma pname="my:yellow"/>
        </nonterminal>
      </alt>
    </rule>

    Other parts of the TM proposal allow pragmas in locations where the
    XML form of the grammar will place the pragma as a child of the
    element representing the thing it annotates (rule or grammar).

    If there is a use case that requires that pragmas be able to occur
    as children of other elements, we need to capture it.  Otherwise,
    any proposal that allows pragmas in other locations risk the charge
    of ... going outside its remit to allow things that are not part of
    the requirements and go well beyond any known use cases.

    In SM, by contrast to TM, pragmas can be located pretty much
    anywhere, which means the grammar writer will need a much better
    grasp of where 's' is used in the ixml grammar for ixml than I
    suspect most people even in the CG will have.  Given a rule like

        a: @b, ^c, -d.

    it is not hard to see (or at least imagine) that comments and
    whitespace can occur in the locations where comments occur below:

        {1}a{2}: {3}@{4}b{5}, {6}^{7}c{8}, {9}-{10}d{11}.{12}

    I suspect that I am not the only member of the CG who would have to
    consult the grammar for ixml to know which element in the XML form
    of this rule will be the parent of each comment.

    If I want a comment or SM pragma placed as a child of the
    nonterminal c, which are my options?  6, 7, and 8, right?  Wrong.
    If I want a comment or SM pragma to appear as a child of the 'rule'
    element, what are my options?

    From where I sit, the ixml grammar currently does a remarkably good
    job of keeping rules visually simple by keeping the 's' nonterminal
    out of the way; it does this in part by pushing the 's' as far down
    in the parse trees as possible.  But as we have seen with the rules
    for class, for @from, and for @to, that sometimes ends up allowing
    comments in places where we don't want them.  As we saw some months
    ago with the rule for ixml, it also sometimes ends up not allowing
    comment elements in the XML form of the grammar in places where we
    do want them.

    If we allow 's' to determine not just where whitespace and comments
    can go but also where pragmas can go, I think the treatment of 's'
    needs re-thinking from the ground up: we will be obligating
    ourselves either to a long and very tedious process of examining
    every occurrence of 's' in the grammar and thinking about where it
    should attach in the parse tree, or to waving our hands and saying a
    bit crossly "it doesn't matter!".  But it does matter.

I hope that explains why I am not yet persuaded that the TM proposal is
too complicated.

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Tuesday, 18 January 2022 18:54:33 UTC