- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Fri, 3 Dec 2021 10:53:25 -0700
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
Looking at this again, I think it might be more useful if I made concrete suggestions. > On 3,Dec2021, at 9:05 AM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote: > > [Language-pedantry alert. Proceed at your own risk.] > >> On 3,Dec2021, at 3:39 AM, Steven Pemberton <steven.pemberton@cwi.nl> wrote: >> >> In the final sweep to a release version, I would like us to resolve these questions in the conformance section: >> 1. >> >> I propose deleting one of these rules, since I believe they are equivalent: >> >> * All rule names that are serialised must match the requirements for an XML name. >> * All nonterminal names which are marked to be serialised must match the requirements of an XML name. > > I think they are not equivalent for a grammar like > > S : A; B. > A: ‘a’. > B : ‘b’. > > ... > > On a side note, perhaps for ’nonterminal names’ we could everywhere just read > ’nonterminals’ ? So maybe: * All nonterminals marked to be serialized must match the requirements of XML names. or * All nonterminals marked to be serialized must match the Name production in the XML specification. > > >> >> 2. >> >> I propose deleting the second rule here, since I believe the first one covers it: >> >> * For every nonterminal name occurring on the right-hand side of a rule, exactly one rule defining that name must exist in the grammar. >> * The grammar must not contain more than one rule defining any given name. > > This grammar seems to me to satisfy the first but not the second rule: > > S: A. > S: B. > A: ‘a’. > B : ‘b’. > > This grammar, on the other hand, seems to me to satisfy the second but not the first rule: > > S: A; B. > A: ‘a’. > > So I do not currently believe that either rule entails the other. There would be > less redundancy if “exactly one rule” in the first item were changed to “some rule” > or “at least one rule”. > > > ... > I think three conflicting principles are at issue here: > > 1 None of these things is necessary and each of them is likely to be an error > on the part of a human grammar writer. ... > > 2 Compared to other grammar-related tools or methods (yacc and friends, > recursive-descent parsing, …), invisible XML makes much fewer demands on > grammars: ... If it satisfies a minimal set of rules for the syntax of > grammars, an invisible XML processor can handle it. > > 3 If we are going to be in the business of flagging hygiene problems in > grammars, it’s probably better to be consistent than to be inconsistent. > > ... > > My current view is that I think ideally ixml processors should be required > to reject grammars only if the grammar is really unusable, and that > ixml processors should be encouraged to report hygiene issues with > warnings not errors; also that if we are going to encourage warnings for > one form of useless nonterminal we should encourage warnings for all. Looking at the spec, I believe that defining productive nonterminals would require a lot of new machinery (and lead to difficulties with rules like X: [].), and similarly for loops. So I am going to abandon much of my third principle (at least, at the level of the spec; I still hope that processors will check grammars for unproductive nonterminals and loops and warn people about them, I just don’t want to try to write that into the spec). So my first proposal is this one. Proposal A: loosen hygiene requirements to recommendations, and include reachability as a recommendation (but not productivity of freedom from loops). - In the Rules section, add at the end In the usual case, every rule in the grammar should be reachable directly or indirectly from the root symbol of the grammar; processors should issue warnings if any rules in the grammar are not reachable. - In the Nonterminals section, delete This name refers to the rule that defines this name, which must exist, and there must only be one such rule. and replace it with This name refers to the rule that defines this name, which should exist, and there should only be one such rule. Processors should issue warnings if no such rule exists, or if more that one such rule exists. - In the Conformance section, replace the rules quoted with • For every nonterminal occurring in the grammar, there should be exactly one rule in the grammar defining that name. • Every nonterminal occurring in the grammar should be reachable from the root symbol. If people disagree either on making these warnings rather than errors, or on adding reachability, then I would propose these alternatives. Proposal B: make undefined and unreachable nonterminals errors. - In the Rules section, add at the end In the usual case, every rule in the grammar must be reachable directly or indirectly from the root symbol of the grammar. - Leave Nonterminals section alone. - In the Conformance section, replace the rules quoted with • For every nonterminal occurring in the grammar, there must be exactly one rule in the grammar defining that name. • Every nonterminal occurring in the grammar must be reachable from the root symbol. Proposal C: prohibit undefined nonterminals but not unreachable nonterminals. - Leave Rules section alone. - Leave Nonterminals section alone. - In the Conformance section, replace the two rules quoted with • For every nonterminal occurring in the grammar, there must be exactly one rule in the grammar defining that name. > > >> >> 3. >> >> For the following rule, >> A processor conforms to this specification if it accepts grammars in ixml form and uses those grammars to parse input and produce XML documents ... A conforming processor must not accept non-conforming grammars. >> >> I propose the wording "A conforming processor must accept grammars in ixml form, and use them to parse input and produce XML documents ... " >> >> An option would be "A conforming processor must accept grammars in ixml form, and should accept them in XML form, and use them ..." Do we have an opinion? > > … In addition to what I said in the earlier mail, I realize now that SP’s edit removes the explicit statement that conforming processors must reject nonconforming grammars. That’s a design choice a spec can make, but I thought we had made the choice that requires flagging errors. If there is only a requirement to accept conforming grammars and process them correctly, then implicitly a processor’s behavior in the face of non-conforming grammars is undefined and unconstrained. If that is so, then a conforming processor can add arbitrary new constructs to the language and relax any and all constraints, and there is no guarantee that a grammar that works with one processor will work with others. So I think omitting the “must not accept non-conforming grammars” is a major design change. > > >> >> 4. >> >> I have a problem with the third requirement in this list: >> >> For any conforming grammar and any input, processors must: * parse the input using the grammar specified, and produce an XML document representing a parse tree for the input, or >> * establish that the input is not described by the grammar, and produce an XML document reporting that fact, or >> * fail for whatever reason (e.g. because available resource limits were exceeded). >> >> since it allows a processor that always fails to be conformant. >> >> I'm in favour of dropping the third requirement. > > If a processor must parse whatever input I give it and succeed in > producing either an appropriate XML output or a correct statement > that the input is not a sentence in the grammar, then doesn’t it > follow that a processor that fails for lack of memory is non-conforming? I have no useful suggestions here to make the third item in the list more palatable. I have checked the Pascal Report (which I seem to be regarding as an example of good careful specification — it may not be perfect but it does seem to me to be pretty good) but do not find anything there about what happens when resources are exceeded or other failures outside the processor’s control occur. Maybe that’s a sign that my fears about this issue are ill founded. Hmm. I think the reason the third item seems necessary to me is the introductory “For any conforming grammar and any input”. But I have been unsuccessful in attempts to reword the sentence. Michael
Received on Friday, 3 December 2021 17:52:37 UTC