ixml and non-xml output - is there an error and if so where? from C. M. Sperberg-McQueen on 2021-12-21 (public-ixml@w3.org from December 2021)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 21 Dec 2021 10:39:54 -0700
To: ixml <public-ixml@w3.org>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Message-Id: <653D301C-7AAE-4419-AEF9-6CB778703230@blackmesatech.com>

Working through Steven’s test suite, I have encountered the test
encoded in

  tests/expr1.ixml
  tests/expr1.inp
  tests/expr1.req

You may want to look at the test case, but the short description is
simple: the grammar marks a repeating nonterminal with ‘@‘ and a
straightforward serialization of the parse tree produces

<expression plusop='+' plusop='+' plusop='+'>
   ...
</expression>

which is not well-formed XML because it violates the "Unique Att Spec"
well-formedness constraint.

Is this a run-time error in the grammar analogous to using a
nonterminal which is not an XML name but which turns out to
need to be serialized as an element or attribute name?

Is it a run-time error that is not in itself an error in the grammar
but just something that prevents successful serialization of the
result, analogous to an out-of-file-space error?  (Only this is not a
transient condition external to the grammar / input pair, this will
happen every time we parse the given input against the given grammar.)

The draft test-suite catalog vocabulary I have been working with so
far allows three distinct assertions about a test case:

  - Parsing the input against the grammar produces an XML structurally 
    equal to ... (an XML element given inline or externally). 

  - The input is not a sentence in the language defined by the 
    grammar (and it thus has no parse tree). 

  - The character stream or XML element supplied as a grammar is not
    in fact a conforming ixml grammar (and thus cannot be used to
    parse the input).

This situation appears to require a new kind of assertion. What should
that assertion be?

Perhaps

  - The abstract syntax tree described by the grammar for this input
    cannot be serialized as well-formed XML.

?  That would cover both unserializable nonterminals and multiple
attributes.

But if unserializable nonterminals and multiple attributes are
regarded as errors in the grammar (perhaps only detectable at input
parse time), then the correct assertion for this test case is the
'not-a-grammar' assertion.

As far as I can tell, the spec does not currently address violations
of well-formedness constraints other than unserializable names.  It
needs to.

Michael

p.s. For now, I am marking this as an error in the grammar in my
catalog of Steven’s test cases.

Received on Tuesday, 21 December 2021 17:40:13 UTC