- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Fri, 3 Dec 2021 09:05:26 -0700
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
[Language-pedantry alert. Proceed at your own risk.] > On 3,Dec2021, at 3:39 AM, Steven Pemberton <steven.pemberton@cwi.nl> wrote: > > In the final sweep to a release version, I would like us to resolve these questions in the conformance section: > 1. > > I propose deleting one of these rules, since I believe they are equivalent: > > * All rule names that are serialised must match the requirements for an XML name. > * All nonterminal names which are marked to be serialised must match the requirements of an XML name. I think they are not equivalent for a grammar like S : A; B. A: ‘a’. B : ‘b’. Given the input ‘a’, I think the first formulation requires that the names ’S’ and ‘A’ be checked to see if they match the requirements of an XML name, but not ‘B’. I think the second formulation requires that all three nonterminal names be checked. That is, the first rule appears to require checking only the names of nonterminals which are in fact serialized in a given run, and the second rule does not have this limitation. On a side note, perhaps for ’nonterminal names’ we could everywhere just read ’nonterminals’ ? > > 2. > > I propose deleting the second rule here, since I believe the first one covers it: > > * For every nonterminal name occurring on the right-hand side of a rule, exactly one rule defining that name must exist in the grammar. > * The grammar must not contain more than one rule defining any given name. This grammar seems to me to satisfy the first but not the second rule: S: A. S: B. A: ‘a’. B : ‘b’. This grammar, on the other hand, seems to me to satisfy the second but not the first rule: S: A; B. A: ‘a’. So I do not currently believe that either rule entails the other. There would be less redundancy if “exactly one rule” in the first item were changed to “some rule” or “at least one rule”. It may be observed that many formal treatments of grammars get by without imposing either of these rules. Undefined nonterminals are necessarily unproductive, and multiple production rules for the same terminal just provide alternative definitions. Under the rubric ‘Hygiene in grammars’, Grune and Jacobs observe several things that usually indicate problems: - references to undefined nonterminals - rules for unreachable nonterminals - unproductive nonterminals - loops (in which a nonterminal N can generate N as a sentential form) The first three G and J call ‘useless nonterminals’ because they will never be used in a parse tree. (And I notice that ‘multiple rules for the same nonterminal’ does not appear in their list of hygiene problems at all.) I think three conflicting principles are at issue here: 1 None of these things is necessary and each of them is likely to be an error on the part of a human grammar writer. For any grammar with undefined, unreachable, or unproductive nonterminals, or loops, an equivalent grammar accepting the same set of strings exists. For all but loops, there is also an equivalent grammar that has the same set of parse trees. 2 Compared to other grammar-related tools or methods (yacc and friends, recursive-descent parsing, …), invisible XML makes much fewer demands on grammars: we do not require the grammars to be LL(1) or LL(k) or LALR(1) or anything of the kind. If it satisfies a minimal set of rules for the syntax of grammars, an invisible XML processor can handle it. 3 If we are going to be in the business of flagging hygiene problems in grammars, it’s probably better to be consistent than to be inconsistent. The first principle suggests that ixml processors are going to be more useful if they alert grammar writers to useless nonterminals and loops. The second principle suggests that if we make them errors we will lose some of what makes ixml distinctive: it turns out we don’t accept arbitrary grammars, only relatively clean arbitrary grammars. The third principle suggests that if we want to alert people to one form of useless nonterminal we should consider alerting them to the others. My current view is that I think ideally ixml processors should be required to reject grammars only if the grammar is really unusable, and that ixml processors should be encouraged to report hygiene issues with warnings not errors; also that if we are going to encourage warnings for one form of useless nonterminal we should encourage warnings for all. > > 3. > > For the following rule, > A processor conforms to this specification if it accepts grammars in ixml form and uses those grammars to parse input and produce XML documents ... A conforming processor must not accept non-conforming grammars. > > I propose the wording "A conforming processor must accept grammars in ixml form, and use them to parse input and produce XML documents ... " > > An option would be "A conforming processor must accept grammars in ixml form, and should accept them in XML form, and use them ..." Do we have an opinion? One or the other of my grammar teachers would tell me to lose the comma as it’s a compound predicate not a compound sentence. My main concern here is that the rule is one of a sequence of three in the conformance section, all with the form A &possibly-conforming-object; conforms to this specification if: &list-of-conditions; and so all offering a summary of sufficient conditions for conformance by objects of particular classes. I am reluctant to lose that parallelism. It may be obvious to some readers that any processor which does what the spec says it ‘must’ do and refrains from doing what the spec says it ‘must not’ do will or should count as ‘conforming’, but I suspect that it seems more obvious to people who have spent years of their lives working in standards development and may not be obvious to everyone who reads the spec. Perhaps I am particularly sensitive to this just now, because I have spent six weeks trying to figure our the relation between the “precincts” in some sets of GIS data and the “voting tabulation districts” in some other datasets, and have thus far neither found an explanation in any public documentation nor succeeded in getting any answer from anyone with authoritative knowledge. I have begun to conjecture that at a crucial moment they said to themselves “well, it’s obvious that the ‘precinct’ dataset describes the old precinct lines and the ‘VTD’ dataset describes the new precinct lines, it really does not need to be stated explicitly’ — or more likely that they think of it as so obvious that they are unconscious of the fact that it is not stated anywhere. Writing our specs to be understood by people who were not in the room is so obvious a point that no one will disagree, and so it’s usually unhelpful to bring it up as a principle. I think this particular edit risks making things less clear to people who are not now in the room. Without the existing sentence, how does someone not familiar with the conventions of spec prose find out what it means to say that XYZ is a conforming processor for invisible XML? > > 4. > > I have a problem with the third requirement in this list: > > For any conforming grammar and any input, processors must: * parse the input using the grammar specified, and produce an XML document representing a parse tree for the input, or > * establish that the input is not described by the grammar, and produce an XML document reporting that fact, or > * fail for whatever reason (e.g. because available resource limits were exceeded). > > since it allows a processor that always fails to be conformant. > > I'm in favour of dropping the third requirement. If a processor must parse whatever input I give it and succeed in producing either an appropriate XML output or a correct statement that the input is not a sentence in the grammar, then doesn’t it follow that a processor that fails for lack of memory is non-conforming? I gave it the input and the grammar, and it neither parsed the input nor told me that the input is not a sentence. That seems to me to mean it failed the conformance requirements. I agree that a script reading echo “Out of resources; failed to complete the parse.” is not a helpful implementation of invisible XML. But the definition of conformance can’t require implementations never to fail, can it? I believe the third clause was copied from, or at least inspired by, a corresponding conformance clause in the Pascal Report. Is there a way to allow conformant processors which sometimes fail without allowing conformant processors which always fail? And if there isn’t, so we must either forbid failure at all times or allow the script shown above, then which situation is better handled by an appeal to quality of implementation? I think it’s better to say that among conforming processors one will prefer processors which sometimes work, than to say that there are no conforming processors and among the non-conforming processors one should prefer the ones that would be conforming if the rules for conformance were a little different. Michael
Received on Friday, 3 December 2021 16:04:37 UTC