Re: Conformance section

Looking at this again, I think it might be more useful if I made
concrete suggestions.

> On 3,Dec2021, at 9:05 AM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
> 
> [Language-pedantry alert.  Proceed at your own risk.]
> 
>> On 3,Dec2021, at 3:39 AM, Steven Pemberton <steven.pemberton@cwi.nl> wrote:
>> 
>> In the final sweep to a release version, I would like us to resolve these questions in the conformance section:
>> 1.
>> 
>> I propose deleting one of these rules, since I believe they are equivalent:
>> 
>> * All rule names that are serialised must match the requirements for an XML name.
>> * All nonterminal names which are marked to be serialised must match the requirements of an XML name.
> 
> I think they are not equivalent for a grammar like
> 
>  S :  A; B.
>  A: ‘a’.
>  B : ‘b’.
> 
> ...
> 
> On a side note, perhaps for ’nonterminal names’ we could everywhere just read
> ’nonterminals’ ?

So maybe:

  * All nonterminals marked to be serialized must match the requirements of XML names.

or

  * All nonterminals marked to be serialized must match the Name production in the XML specification.

> 
> 
>> 
>> 2.
>> 
>> I propose deleting the second rule here, since I believe the first one covers it:
>> 
>> * For every nonterminal name occurring on the right-hand side of a rule, exactly one rule defining that name must exist in the grammar.
>> * The grammar must not contain more than one rule defining any given name.
> 
> This grammar seems to me to satisfy the first but not the second rule:
> 
>  S:  A.
>  S:  B.
>  A: ‘a’.
>  B : ‘b’.
> 
> This grammar, on the other hand, seems to me to satisfy the second but not the first rule:
> 
>  S:  A; B.
>  A: ‘a’.
> 
> So I do not currently believe that either rule entails the other.  There would be
> less redundancy if “exactly one rule” in the first item were changed to “some rule”
> or “at least one rule”.
> 
> 
> ...
> I think three conflicting principles are at issue here:
> 
> 1 None of these things is necessary and each of them is likely to be an error
> on the part of a human grammar writer.  ...
> 
> 2 Compared to other grammar-related tools or methods (yacc and friends,
> recursive-descent parsing, …), invisible XML makes much fewer demands on
> grammars:  ... If it satisfies a minimal set of rules for the syntax of
> grammars, an invisible XML processor can handle it.
> 
> 3 If we are going to be in the business of flagging hygiene problems in 
> grammars, it’s probably better to be consistent than to be inconsistent.
> 
> ...
> 
> My current view is that I think ideally ixml processors should be required
> to reject grammars only if the grammar is really unusable, and that 
> ixml processors should be encouraged to report hygiene issues with 
> warnings not errors; also that if we are going to encourage warnings for
> one form of useless nonterminal we should encourage warnings for all.

Looking at the spec, I believe that defining productive nonterminals
would require a lot of new machinery (and lead to difficulties with
rules like X: [].), and similarly for loops.  So I am going to abandon
much of my third principle (at least, at the level of the spec; I
still hope that processors will check grammars for unproductive
nonterminals and loops and warn people about them, I just don’t want
to try to write that into the spec).

So my first proposal is this one.

Proposal A: loosen hygiene requirements to recommendations, and
include reachability as a recommendation (but not productivity of
freedom from loops).

- In the Rules section, add at the end

        In the usual case, every rule in the grammar should be
        reachable directly or indirectly from the root symbol of the
        grammar; processors should issue warnings if any rules in the
        grammar are not reachable.

- In the Nonterminals section, delete

        This name refers to the rule that defines this name, which
        must exist, and there must only be one such rule.

and replace it with

        This name refers to the rule that defines this name, which
        should exist, and there should only be one such rule.
        Processors should issue warnings if no such rule exists, or if
        more that one such rule exists.

- In the Conformance section, replace the rules quoted with

      • For every nonterminal occurring in the grammar, there should 
        be exactly one rule in the grammar defining that name.
 
      • Every nonterminal occurring in the grammar should be reachable
        from the root symbol.


If people disagree either on making these warnings rather than errors,
or on adding reachability, then I would propose these alternatives.

Proposal B: make undefined and unreachable nonterminals errors.

- In the Rules section, add at the end

        In the usual case, every rule in the grammar must be reachable
        directly or indirectly from the root symbol of the grammar.

- Leave Nonterminals section alone.

- In the Conformance section, replace the rules quoted with

      • For every nonterminal occurring in the grammar, there must 
        be exactly one rule in the grammar defining that name.
 
      • Every nonterminal occurring in the grammar must be reachable
        from the root symbol.

Proposal C: prohibit undefined nonterminals but not unreachable
nonterminals.

- Leave Rules section alone.

- Leave Nonterminals section alone.

- In the Conformance section, replace the two rules quoted with

      • For every nonterminal occurring in the grammar, there must be
        exactly one rule in the grammar defining that name.



> 
> 
>> 
>> 3.
>> 
>> For the following rule, 
>>  A processor conforms to this specification if it accepts grammars in ixml form and uses those grammars to parse input and produce XML documents ...  A conforming processor must not accept non-conforming grammars.
>> 
>> I propose the wording "A conforming processor must accept grammars in ixml form, and use them to parse input and produce XML documents ... "
>> 
>> An option would be "A conforming processor must accept grammars in ixml form, and should accept them in XML form, and use them ..." Do we have an opinion?
> 
> …

In addition to what I said in the earlier mail, I realize now that SP’s edit 
removes the explicit statement that conforming processors must reject
nonconforming grammars.  That’s a design choice a spec can make, but
I thought we had made the choice that requires flagging errors.  

If there is only a requirement to accept conforming grammars and process 
them correctly, then implicitly a processor’s behavior in the face of
non-conforming grammars is undefined and unconstrained.  If that is
so, then a conforming processor can add arbitrary new
constructs to the language and relax any and all constraints, and
there is no guarantee that a grammar that works with one processor
will work with others.  

So I think omitting the “must not accept non-conforming grammars” is
a major design change.

> 
> 
>> 
>> 4.
>> 
>> I have a problem with the third requirement in this list:
>> 
>> For any conforming grammar and any input, processors must: * parse the input using the grammar specified, and produce an XML document representing a parse tree for the input, or
>> * establish that the input is not described by the grammar, and produce an XML document reporting that fact, or
>> * fail for whatever reason (e.g. because available resource limits were exceeded).
>> 
>> since it allows a processor that always fails to be conformant.
>> 
>> I'm in favour of dropping the third requirement.
> 
> If a processor must parse whatever input I give it and succeed in 
> producing either an appropriate XML output or a correct statement
> that the input is not a sentence in the grammar, then doesn’t it 
> follow that a processor that fails for lack of memory is non-conforming?

I have no useful suggestions here to make the third item in the list
more palatable.  I have checked the Pascal Report (which I seem to
be regarding as an example of good careful specification — it may not
be perfect but it does seem to me to be pretty good) but do not find
anything there about what happens when resources are exceeded or
other failures outside the processor’s control occur.  Maybe that’s
a sign that my fears about this issue are ill founded.

Hmm.  I think the reason the third item seems necessary to me is the
introductory “For any conforming grammar and any input”.  But I
have been unsuccessful in attempts to reword the sentence.

Michael

Received on Friday, 3 December 2021 17:52:37 UTC