- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Tue, 18 Jan 2022 11:54:12 -0700
- To: public-ixml@w3.org
During the call earlier today, Steven Pemberton summarized his
objections to the pragmas proposal put forward by Tom Hillman and me in
three points, of which I unfortunately remember only two: it's too
complicated and it goes beyond its remit. I'd like to address these two
issues, if I can do so usefully.
(1) On the issue of our remit, I think Tom and I have already answered
the objection. We did not see a way to make pragmas work well without
some mechanism for distributed naming, so we faced the choice between
making a proposal for distributed naming part of the pragmas proposal or
not making a pragmas proposal.
Those who believe distributed naming is not necessary in order to
satisfy the requirements Tom and I identified are welcome to make their
case that it's not necessary, or that our requirements are too
stringent. But so far I have not seen anyone making either of those
cases.
The proposal we made for distributed naming was to reuse the QName
mechanism now familiar to pretty much anyone who uses XML seriously,
including people who avoid namespaces wherever possible. In principle,
as we said last week, any other mechanism would do as well. Having
thought about it in the last week, I think SP's strawman proposal
persuaded me that QNames are really the only plausible solution in an
XML context, because they are familiar and well understood. Any other
mechanism will elicit the question from users "why didn't you just use
QNames?"
I understand that not everyone in the group thought, when Tom and I took
an action to develop a pragmas proposal, that it would also entail a
proposal for namespaces or something like them. But our remit was to
produce a workable proposal for pragmas; I think that any workable
pragmas proposal requires a workable proposal for QNames. As I say,
anyone in or outside the group is welcome to explain why it doesn't.
(2) On the issue of complication, I would most of all like a bit more
specificity. It's hard to answer so vague and sweeping an objection,
and I am reduced to guessing which parts of the proposal people think
are too complicated.
Judging SP's strawman proposal as a baseline level of complexity (and
using the names Tinman (TM) and Strawman (SM) for brevity, I think I see
some areas in which SM is simpler than TM, some in which it's more
complex, and a number of areas where the changes don't seem to make any
significant difference.
- Several ways in which SM differs from TM appear to me irrelevant to
questions of complexity -- that is, they neither make things simpler
nor make them more complicated. Among these I would list:
. The change in delimiters.
. The prohibition on empty comments.
. Allowing empty processor specifications.
. Forbidding blanks but not newline, tab, or other whitespace
characters within the processor specification.
. Requiring blank and not allowing newline, tab, or any other other
whitespace to separate the processor specification from the
pragma body.
. Defining the XML form of pragmas as having mixed content rather
than element content.
(These are all things I think of as weaknesses in SM, but none seems
to be intended as a simplification.)
- In SM, pragmas have a slightly more complicated internal structure
than in TM, since processors are required to recognize comments and
pragmas embedded in pragmas.
(I think, by the way, that this is a design error and contradicts the
principle that "The structure of body text of any pragma is defined by
the processor it is addressed to." A better design allows the pragma
delimiters to occur within a pragma, but without requiring that when
they are encountered they define a syntactically legal pragma. And
ditto for comments. But that is not directly relevant to the question
of complexity.)
- In SM, pragmas are allowed wherever whitespace and comments are
allowed, which reduces complexity as measured by the number of changes
to the grammar.
I wonder if the other changes to the grammar in TM are what SP has in
mind when he says it's "too complicated"; I suspect it is.
On the other hand, as far as I can tell SM is more complex to use for
the grammar writer, especially but not exclusively the grammar writer
who cares about the XML form of the grammar.
The reason for TM's design in this area is that in every use case
anyone has reported for pragmas, the pragma can be understood as an
annotation on a symbol in the right-hand side of a rule, on a rule, or
on the grammar itself. There may be other use cases which have
different requirements, but so far no one has mentioned any. So TM
reflects an attempt to make the syntax of pragmas suitable in those
three cases.
The examples of XQuery and ixml itself illustrate that quite often an
intuitive syntax for annotating any thing puts the annotation before
the thing. In ixml, to annotate a nonterminal with a mark, we write
the mark before the nonterminal; the syntax of annotations and pragmas
in XQuery similarly puts the annotation or the pragma first. Any
discussion of attribute grammars will tend to illustrate an opposite
tendency: the attribute value assignment rules for any grammar
production can be viewed as annotations on the rule, but invariably
follow the rule rather than preceding it.
TM allows annotations on a symbol to occur before it, before or after
any mark. So for a rule of the form
a : ¿my:red? @b, ¿my:orange? ^c, ¿my:yellow? -d.
or equivalently
a : @ ¿my:red? b, ^ ¿my:orange? c, - ¿my:yellow? d.
the XML form places the pragmas named my:* as children of the
nonterminal elements:
<rule name="a">
<alt>
<nonterminal mark="@" name="b">
<pragma pname="my:red"/>
</nonterminal>
<nonterminal mark="^" name="c">
<pragma pname="my:orange"/>
</nonterminal>
<nonterminal mark="-" name="d">
<pragma pname="my:yellow"/>
</nonterminal>
</alt>
</rule>
Other parts of the TM proposal allow pragmas in locations where the
XML form of the grammar will place the pragma as a child of the
element representing the thing it annotates (rule or grammar).
If there is a use case that requires that pragmas be able to occur
as children of other elements, we need to capture it. Otherwise,
any proposal that allows pragmas in other locations risk the charge
of ... going outside its remit to allow things that are not part of
the requirements and go well beyond any known use cases.
In SM, by contrast to TM, pragmas can be located pretty much
anywhere, which means the grammar writer will need a much better
grasp of where 's' is used in the ixml grammar for ixml than I
suspect most people even in the CG will have. Given a rule like
a: @b, ^c, -d.
it is not hard to see (or at least imagine) that comments and
whitespace can occur in the locations where comments occur below:
{1}a{2}: {3}@{4}b{5}, {6}^{7}c{8}, {9}-{10}d{11}.{12}
I suspect that I am not the only member of the CG who would have to
consult the grammar for ixml to know which element in the XML form
of this rule will be the parent of each comment.
If I want a comment or SM pragma placed as a child of the
nonterminal c, which are my options? 6, 7, and 8, right? Wrong.
If I want a comment or SM pragma to appear as a child of the 'rule'
element, what are my options?
From where I sit, the ixml grammar currently does a remarkably good
job of keeping rules visually simple by keeping the 's' nonterminal
out of the way; it does this in part by pushing the 's' as far down
in the parse trees as possible. But as we have seen with the rules
for class, for @from, and for @to, that sometimes ends up allowing
comments in places where we don't want them. As we saw some months
ago with the rule for ixml, it also sometimes ends up not allowing
comment elements in the XML form of the grammar in places where we
do want them.
If we allow 's' to determine not just where whitespace and comments
can go but also where pragmas can go, I think the treatment of 's'
needs re-thinking from the ground up: we will be obligating
ourselves either to a long and very tedious process of examining
every occurrence of 's' in the grammar and thinking about where it
should attach in the parse tree, or to waving our hands and saying a
bit crossly "it doesn't matter!". But it does matter.
I hope that explains why I am not yet persuaded that the TM proposal is
too complicated.
--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Tuesday, 18 January 2022 18:54:33 UTC