Re: Conformance section from C. M. Sperberg-McQueen on 2021-12-03 (public-ixml@w3.org from December 2021)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Fri, 3 Dec 2021 09:05:26 -0700
To: Steven Pemberton <steven.pemberton@cwi.nl>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
Message-Id: <752EA37E-FC3A-434D-B7E3-295C04220546@blackmesatech.com>
[Language-pedantry alert.  Proceed at your own risk.]

> On 3,Dec2021, at 3:39 AM, Steven Pemberton <steven.pemberton@cwi.nl> wrote:
> 
> In the final sweep to a release version, I would like us to resolve these questions in the conformance section:
> 1.
> 
> I propose deleting one of these rules, since I believe they are equivalent:
> 
> * All rule names that are serialised must match the requirements for an XML name.
> * All nonterminal names which are marked to be serialised must match the requirements of an XML name.

I think they are not equivalent for a grammar like

  S :  A; B.
  A: ‘a’.
  B : ‘b’.

Given the input ‘a’, I think the first formulation requires that the names ’S’ and ‘A’ be
checked to see if they match the requirements of an XML name, but not ‘B’.  I think
the second formulation requires that all three nonterminal names be checked.

That is, the first rule appears to require checking only the names of nonterminals which
are in fact serialized in a given run, and the second rule does not have this limitation.

On a side note, perhaps for ’nonterminal names’ we could everywhere just read
’nonterminals’ ?


> 
> 2.
> 
> I propose deleting the second rule here, since I believe the first one covers it:
> 
> * For every nonterminal name occurring on the right-hand side of a rule, exactly one rule defining that name must exist in the grammar.
> * The grammar must not contain more than one rule defining any given name.

This grammar seems to me to satisfy the first but not the second rule:

  S:  A.
  S:  B.
  A: ‘a’.
  B : ‘b’.

This grammar, on the other hand, seems to me to satisfy the second but not the first rule:

  S:  A; B.
  A: ‘a’.

So I do not currently believe that either rule entails the other.  There would be
less redundancy if “exactly one rule” in the first item were changed to “some rule”
or “at least one rule”.


It may be observed that many formal treatments of grammars get by without 
imposing either of these rules.  Undefined nonterminals are necessarily 
unproductive, and multiple production rules for the same terminal just provide
alternative definitions.

Under the rubric ‘Hygiene in grammars’, Grune and Jacobs observe several things
that usually indicate problems:  

  - references to undefined nonterminals
  - rules for unreachable nonterminals
  - unproductive nonterminals
  - loops (in which a nonterminal N can generate N as a sentential form)

The first three G and J call ‘useless nonterminals’ because they will never be
used in a parse tree.  (And I notice that ‘multiple rules for the same nonterminal’
does not appear in their list of hygiene problems at all.)

I think three conflicting principles are at issue here:

1 None of these things is necessary and each of them is likely to be an error
on the part of a human grammar writer.  For any grammar with undefined, 
unreachable, or unproductive nonterminals, or loops, an equivalent grammar
accepting the same set of strings exists.  For all but loops, there is also 
an equivalent grammar that has the same set of parse trees.

2 Compared to other grammar-related tools or methods (yacc and friends,
recursive-descent parsing, …), invisible XML makes much fewer demands on
grammars:  we do not require the grammars to be LL(1) or LL(k) or LALR(1)
or anything of the kind.  If it satisfies a minimal set of rules for the syntax of
grammars, an invisible XML processor can handle it.

3 If we are going to be in the business of flagging hygiene problems in 
grammars, it’s probably better to be consistent than to be inconsistent.

The first principle suggests that ixml processors are going to be more
useful if they alert grammar writers to useless nonterminals and loops.

The second principle suggests that if we make them errors we will lose
some of what makes ixml distinctive:  it turns out we don’t accept arbitrary
grammars, only relatively clean arbitrary grammars.

The third principle suggests that if we want to alert people to one form
of useless nonterminal we should consider alerting them to the others.


My current view is that I think ideally ixml processors should be required
to reject grammars only if the grammar is really unusable, and that 
ixml processors should be encouraged to report hygiene issues with 
warnings not errors; also that if we are going to encourage warnings for
one form of useless nonterminal we should encourage warnings for all.


> 
> 3.
> 
> For the following rule, 
>  A processor conforms to this specification if it accepts grammars in ixml form and uses those grammars to parse input and produce XML documents ...  A conforming processor must not accept non-conforming grammars.
> 
> I propose the wording "A conforming processor must accept grammars in ixml form, and use them to parse input and produce XML documents ... "
> 
> An option would be "A conforming processor must accept grammars in ixml form, and should accept them in XML form, and use them ..." Do we have an opinion?

One or the other of my grammar teachers would tell me to lose the comma
as it’s a compound predicate not a compound sentence.  

My main concern here is that the rule is one of a sequence of three in the
conformance section, all with the form

    A &possibly-conforming-object; conforms to this specification if:

    &list-of-conditions;

and so all offering a summary of sufficient conditions for conformance by
objects of particular classes.  I am reluctant to lose that parallelism.

It may be obvious to some readers that any processor which does what
the spec says it ‘must’ do and refrains from doing what the spec says it
‘must not’ do will or should count as ‘conforming’, but I suspect that it
seems more obvious to people who have spent years of their lives working
in standards development and may not be obvious to everyone who reads
the spec.

Perhaps I am particularly sensitive to this just now, because I have spent
six weeks trying to figure our the relation between the “precincts” in 
some sets of GIS data and the “voting tabulation districts” in some other
datasets, and have thus far neither found an explanation in any public 
documentation nor succeeded in getting any answer from anyone with
authoritative knowledge.  I have begun to conjecture that at a crucial 
moment they said to themselves “well, it’s obvious that the ‘precinct’
dataset describes the old precinct lines and the ‘VTD’ dataset describes
the new precinct lines, it really does not need to be stated explicitly’ —
or more likely that they think of it as so obvious that they are unconscious
of the fact that it is not stated anywhere.

Writing our specs to be understood by people who were not in the 
room is so obvious a point that no one will disagree, and so it’s usually 
unhelpful to bring it up as a principle.  I think this particular edit risks 
making things less clear to people who are not now in the room.  

Without the existing sentence, how does someone not familiar with
the conventions of spec prose find out what it means to say that 
XYZ is a conforming processor for invisible XML?


> 
> 4.
> 
> I have a problem with the third requirement in this list:
> 
> For any conforming grammar and any input, processors must: * parse the input using the grammar specified, and produce an XML document representing a parse tree for the input, or
> * establish that the input is not described by the grammar, and produce an XML document reporting that fact, or
> * fail for whatever reason (e.g. because available resource limits were exceeded).
> 
> since it allows a processor that always fails to be conformant.
> 
> I'm in favour of dropping the third requirement.

If a processor must parse whatever input I give it and succeed in 
producing either an appropriate XML output or a correct statement
that the input is not a sentence in the grammar, then doesn’t it 
follow that a processor that fails for lack of memory is non-conforming?

I gave it the input and the grammar, and it neither parsed the input
nor told me that the input is not a sentence.  That seems to me to
mean it failed the conformance requirements.

I agree that a script reading

    echo “Out of resources; failed to complete the parse.”

is not a helpful implementation of invisible XML.  But the definition
of conformance can’t require implementations never to fail, can it?

I believe the third clause was copied from, or at least inspired by, a
corresponding conformance clause in the Pascal Report.

Is there a way to allow conformant processors which sometimes fail 
without allowing conformant processors which always fail?

And if there isn’t, so we must either forbid failure at all times or
allow the script shown above, then which situation is better handled
by an appeal to quality of implementation?  I think it’s better to say
that among conforming processors one will prefer processors which
sometimes work, than to say that there are no conforming processors
and among the non-conforming processors one should prefer
the ones that would be conforming if the rules for conformance 
were a little different.

Michael
Received on Friday, 3 December 2021 16:04:37 UTC