Re: ixml and non-xml output - is there an error and if so where? from Steven Pemberton on 2021-12-22 (public-ixml@w3.org from December 2021)

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Wed, 22 Dec 2021 14:43:10 +0000
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Message-Id: <1640182618034.2600723246.3723355486@cwi.nl>

I think this should be our philosophical position:
ixml provides a way for the author to convert non-XML documents into XML.
It is up to the author to write ixml so that it produces correct XML, and therefore to ensure that:
* serialised names are correct XML names,
* attribute and element content do not contain illegal characters,
* any element does not have more than one attribute of a given name
and not worry more about these issues within the ixml definition.

As to classifications, in my recent mail on this I proposed a 5-way split

1. ixml grammar syntax errors 
2. ixml grammar semantic errors 
3. ixml grammar correct, input and grammar don't match 
4. ixml grammar correct, input is ambiguous 
5. ixml grammar correct, test completes correctly

What you are proposing I think is a sixth:

6. ixml grammar correct, test completes correctly, resulting XML is in error as a result of authoring errors.

Steven

On Tuesday 21 December 2021 18:39:54 (+01:00), C. M. Sperberg-McQueen wrote:

> Working through Steven’s test suite, I have encountered the test
> encoded in
>
> tests/expr1.ixml
> tests/expr1.inp
> tests/expr1.req
>
> You may want to look at the test case, but the short description is
> simple: the grammar marks a repeating nonterminal with ‘@‘ and a
> straightforward serialization of the parse tree produces
>
> <expression plusop='+' plusop='+' plusop='+'>
> ...
> </expression>
>
> which is not well-formed XML because it violates the "Unique Att Spec"
> well-formedness constraint.
>
> Is this a run-time error in the grammar analogous to using a
> nonterminal which is not an XML name but which turns out to
> need to be serialized as an element or attribute name?
>
> Is it a run-time error that is not in itself an error in the grammar
> but just something that prevents successful serialization of the
> result, analogous to an out-of-file-space error? (Only this is not a
> transient condition external to the grammar / input pair, this will
> happen every time we parse the given input against the given grammar.)
>
> The draft test-suite catalog vocabulary I have been working with so
> far allows three distinct assertions about a test case:
>
> - Parsing the input against the grammar produces an XML structurally
> equal to ... (an XML element given inline or externally).
>
> - The input is not a sentence in the language defined by the
> grammar (and it thus has no parse tree).
>
> - The character stream or XML element supplied as a grammar is not
> in fact a conforming ixml grammar (and thus cannot be used to
> parse the input).
>
> This situation appears to require a new kind of assertion. What should
> that assertion be?
>
> Perhaps
>
> - The abstract syntax tree described by the grammar for this input
> cannot be serialized as well-formed XML.
>
> ? That would cover both unserializable nonterminals and multiple
> attributes.
>
> But if unserializable nonterminals and multiple attributes are
> regarded as errors in the grammar (perhaps only detectable at input
> parse time), then the correct assertion for this test case is the
> 'not-a-grammar' assertion.
>
> As far as I can tell, the spec does not currently address violations
> of well-formedness constraints other than unserializable names. It
> needs to.
>
> Michael
>
> p.s. For now, I am marking this as an error in the grammar in my
> catalog of Steven’s test cases.
>
>
>

Received on Wednesday, 22 December 2021 14:43:34 UTC