ACTION: Michael to comment on conformance section of new draft from C. M. Sperberg-McQueen on 2021-04-14 (public-ixml@w3.org from April 2021)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 13 Apr 2021 18:41:41 -0600
To: public-ixml@w3.org
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Message-Id: <BED068D9-C354-488C-A986-E49A801C1006@blackmesatech.com>

On this morning's call I took an action to look at the conformance section and report on it. It seems good as far as it goes, but I think it would be improved by saying explicitly that conformance as a concept applies both to grammars and to parsers, and by separating the description of conformance for the two.

I append a sketch of a revised conformance section. I have included some notes, which are intended to be non-normative. I'm not sure whether the note reproducing the 'must' statements related to grammars is needed or a good idea or a bad idea. The note about conformance being meaningless for input may be unnecessary here, but I remember that it was a huge struggle to cleanse the XSD specification of wording that described constructs in XML documents which did not match the description in the schema as "errors" instead of "invalid", although "error" was normally used in the spec to describe failures to conform to the spec.

Full disclosure: On three points, the text below about parsers goes beyond what is in the draft of 2021-04-06.

- The penultimate item in the list talks about how the XML documents are to be returned. It is intended to encourage the kind of command-line interface that would allow an ixml parser to be used as a stage in a shell pipeline, but also to allow other interfaces (including -- full disclosure -- the interface I have specified for Aparecium, which is intended to be called from XSLT or XQuery and to produce an XDM instance that the rest of the program can then process).

- The final item is about parsing the whole of the input against the grammar, or against some portion of the input. It is intended to address the topic we spent our meeting on today. It tries to require that the behavior Steven described be (a) available, and (b) the default, when the behavior is meaningful. For the case of streams of indeterminate length, however, it does not say anything about maximal consumption of the input or about greedy parsing, so it does not go as far as some of us suggested. Steven's use case involves files, if I understood him correctly, and I understood him to mean "normal" files, not special cases like processes or infinite streams. The text suggested is intended to say that a conforming parser may offer to work on infinite streams, but to say nothing more on that topic, on the principle of Least said, soonest mended.

- The item beginning "If more than one parse tree describes the input" proposes a change. The current spec says that if there is more than one tree, the parser must return one. The wording below weakens this requirement to say that parsers may return one, may return more than one, must be capable of returning just one, and that returning just one tree should be the default. We should discuss this to make sure people are happy with it.

Michael

................................................................

*Conformance

In this specification, descriptions of grammars or parsers using the verb "must" express unconditional requirements for conformance to the specification. Descriptions using the verb "should" express recommendations which the writers of grammars and the creators of parsers are encouraged to follow but which are not conditions of conformance. Descriptions using the verb "may" express optional features which are neither required not prohibited.

Conformance to this specification can meaningfully be claimed for grammars and for parsers.

Note: although input described by a grammar is sometimes described as "obeying" or "conforming to" the grammar, conformance to this specification cannot be claimed of input streams or of input + grammar pairs.

**Conformance of grammars

An ixml grammar in ixml form conforms to this specification if

- it is described by the grammar given in this specification, and
- it satisfies all the other requirements specified for ixml grammars.

An ixml grammar in XML form conforms to this specification if

- it can be derived from an ixml grammar in ixml form by parsing as described in this specification, and
- it satisfies all the other requirements specified for ixml grammars.

Note: The normative formulations of conformance requirements are those given elsewhere in this specification. But for convenience the requirements that go beyond what is expressed in the grammar itself may be summarized as follow:

. All rule names that are serialised must match the requirements for an XML name.

. For every nonterminal name occurring on the right-hand side of a rule, some rule defining that name must exist in the grammar.

. The grammar must not contain more than one rule defining any given name.

. Terminal symbols must not be marked as attributes.

. Any character class used must be one that is listed in the Unicode specification.

. The number represented in a hex encoding of a character must be within the Unicode character range. (This entails that the hex value must not be that of a surrogate code point.)

. If the first rule in a grammar is marked as hidden, all of its productions must produce exactly one non-hidden nonterminal and no non-hidden terminals before or after that nonterminal.

. All nonterminal names which are marked to be serialised must match the requirements of an XML name.

Reasonable efforts have been used to make this list complete, but omission of any conformance requirement from this list does not affect its status as a conformance requirement.

End note.

**Conformance of parsers

A parse conforms to this specification if it accepts grammars in ixml form and uses those grammars to parse input and produce XML documents representing parse trees as specified elsewhere in this specification.

In addition to requirements mentioned elsewhere in this specification, the following also apply to conforming parsers:

- For any conforming grammar and any input, parsers must either

. successfully parse the input using the grammar specified, and produce at least one XML document showing a parse tree for the input, or
. successfully establish that the input is not described by the grammar, and produce an XML document reporting that fact, or
. fail for whatever reason (e.g. because available resource limits were exceeded).

Note: the first and second items of the list above, together with other requirements, entail that grammars must be processed by an algorithm that accepts and parses any context-free grammar. Known algorithms of this class include Earley parsing, Unger parsing, CYK parsing, GLR parsing, and GLL parsing.

- If more than one parse tree describes the input, the parser may produce any one of them. It is not defined how this choice is made, but the resulting parse must be marked as ambiguous by including the attribute ixml:state="ambiguous" on the document element of the serialisation. Parsers may produce more than one parse tree, but must provide a mode of operation in which they produce at most one; this single-tree mode should be the default.

- If the parse fails, the parser must produce a 'failure document', which is some XML document with ixml:state="failed" on the document element. The document should provide helpful information about where and why it failed; it may be a partial parse tree that includes parts of the parse that succeeded.

- If the root node in the grammar is marked as an attribute, parsers must ignore that marking when serialising the rule as the root.

- The form in which XML documents should be produced is not constrained by this specification; where possible, parsers should be capable of producing serialized XML as a character stream, but other forms (e.g. DOM instances or XDM instances) may also be used.

- In the normal case, when the input has a determinate length (either known in advance or signaled by some end-of-stream signal), the parser must by default parse the input in its entirety against the grammar and return either a parse tree or a failure document. Parsers may provide user options to request other behaviors (such as parsing the largest, or smallest, prefix of the input that is described by the grammar). Parsers may also support invocation with input streams of indeterminate length.

********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************

Received on Wednesday, 14 April 2021 00:42:09 UTC