Re: ixml and non-xml output - is there an error and if so where? from Tom Hillman on 2022-01-04 (public-ixml@w3.org from January 2022)

From: Tom Hillman <tom@expertml.com>
Date: Tue, 4 Jan 2022 12:03:42 +0000
To: Steven Pemberton <steven.pemberton@cwi.nl>, "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Cc: ixml <public-ixml@w3.org>
Message-ID: <3e587073-8395-4007-8a6a-3574a15470d7@Spark>
Happy New Year!

Slightly insincere apologies for not replying sooner: I have been enforcing a policy of snoozing 'work' related emails this holiday season while we've been travelling and visiting family for Christmas and New Year, while dodging the 'Rona.

I think that I expressed some concern over committing to rejecting non-conforming grammars: I think that the distinction between 'static' and 'dynamic' errors here is a useful step towards recognising what is feasible and/or possible, and I agree with Michael that we should concentrate on accepting grammars without 'static' errors, and merely report static errors as they occur.

Tom

_________________
Tomos Hillman
eXpertML Ltd
+44 7793 242058
On 22 Dec 2021, 19:07 +0000, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>, wrote:
> On 22,Dec2021, at 7:43 AM, Steven Pemberton <steven.pemberton@cwi.nl> wrote:
>
> > I think this should be our philosophical position:
>
> > ixml provides a way for the author to convert non-XML documents into
> > XML. It is up to the author to write ixml so that it produces
> > correct XML, and therefore to ensure that:
>
> > * serialised names are correct XML names,
> > * attribute and element content do not contain illegal
> > characters,
> > * any element does not have more than one attribute of a given name
> > and not worry more about these issues within the ixml definition.
>
> > As to classifications, in my recent mail on this I proposed a 5-way
> > split
>
> > 1. ixml grammar syntax errors
> > 2. ixml grammar semantic errors
> > 3. ixml grammar correct, input and grammar don't match
> > 4. ixml grammar correct, input is ambiguous
> > 5. ixml grammar correct, test completes correctly
>
> > What you are proposing I think is a sixth:
>
> > 6. ixml grammar correct, test completes correctly, resulting XML is
> > in error as a result of authoring errors.
>
> My answer has grown into two different messages: one about test suites
> and one about parsing outcomes and errors. This is the one about
> outcomes.
>
> I think your five-way split makes sense as a rough classification of
> outcomes, though I don't understand your second item and I think that
> items 4 and 5 belong together.
>
> The test case tests/expr1.* may lead us to postulate a sixth kind of
> outcome, but I don't think I understand the problem well enough to
> make such a proposal now, and your description worries me a bit.
>
> My first concern is with the words "test completes correctly". I am
> not confident that the test will complete at all in my processor,
> since the well-formedness errors may well cause a run-time exception.
> (I haven't yet tried it, so I don't know.)
>
> My second concern is that I don't know what 'correctly' might mean
> here.
>
> Third, I don't know who the 'author' is who has committed authoring
> errors.
>
> If the "author" here is the writer of the input, then this sounds like
> saying that if the input to expr1.ixml produces non-well-formed
> output, then the input is not as expected, and the correct
> specification of the expected result is to say that the input is not a
> sentence in the language defined by the grammar.
>
> But the input does conform to the grammar as written, interpreted
> solely as a context-free grammar: it's just the annotations for XML
> serialization that cause the problem. If 'plusop' were marked ^
> rather than @, there would be no problem serializing the result. On
> balance, I don't think this is the right analysis.
>
> If the "author" you have in mind is the writer of the ixml grammar,
> then this sounds like saying that a grammar that produces
> non-well-formed output is a faulty grammar. If so, then I think the
> correct specification of the expected result is to say that expr1.ixml
> is not a conforming grammar.
>
> That analysis does have a wrinkle. If the file tests/expr1.ixml is
> not a conforming ixml grammar, then a conforming processor is
> currently required to reject it, even though on some inputs
> (e.g. "1+2") the problem will not be visible. So we seem to have a
> choice:
>
> - We can say that conforming processors must detect the problem
> here, even if the input does not exercise it.
>
> or
>
> - We can say that conforming processors are not required after all
> to detect and report errors in grammars.
>
> The same choice arises in connection with checking whether
> nonterminals are legal names, though we didn't notice when we were
> discussing that topic.
>
>
> Perhaps we do need another category.
>
> I think the philosophical position you outline here, and what you have
> proposed in connection with nonterminals and XML names, amount to
> distinguishing two classes of errors:
>
> (1) errors in grammars we can check for and detect, independently of
> any input string, and
>
> (2) errors that are only detectable (or only readily detectable) given
> both grammar and input.
>
> I'll call these static and dynamic errors for short.
>
> Since static errors are detectable by examining grammars in isolation,
> it's plausible to call them errors in the grammar.
>
> Since dynamic errors are not required to be detected (or possibly not
> detectable) by examining grammars in isolation, we may or may not call
> them errors in the grammar, or errors in the input, or something else.
>
> They are perhaps errors in the grammar: they are failures of the
> grammar writer to guarantee that the markings specify well-formed
> output for all grammatical inputs. (But we require processors to
> reject non-conforming grammars: if non-conformance is not statically
> detectable, that requirement is impossible to satisfy.)
>
> They are perhaps errors in the input. Only, 'error' in a spec like
> ours usually means that something is non-conforming. We have
> conformance rules for grammars and for processors; input is not
> something that can conform or fail to conform to our spec.
>
> Or perhaps they are best described as dynamic errors or dynamic
> exceptions, without pointing either at the grammar or at the input.
> They are situations that can arise while parsing input against a
> conforming grammar.
>
> If we do need another category of outcome from a parsing run or test
> case (I'm still thinking), I am tempted to say:
>
> - Such a run-time exception is not an error in the grammar; conforming
> grammars can have run-time exceptions on some inputs. (So
> processors are not required to detect and reject grammars that could
> have run-time exceptions for some inputs.)
>
> - Such a run-time exception is not a sign that the input is not
> grammatical.
>
> - Conforming processors are required to report such run-time
> exceptions and MAY recover from them. (In the case of expr1.*,
> recovery might take the form of choosing any one of the 'plusOp'
> attribute-value specifications to include and discarding the others,
> or it might take the form of ignoring the @ marking on plusOp and
> serializing it as an element.)
>
> Michael
>
>
>
Received on Tuesday, 4 January 2022 12:04:14 UTC