- From: Tom Hillman <tom@expertml.com>
- Date: Tue, 4 Jan 2022 12:03:42 +0000
- To: Steven Pemberton <steven.pemberton@cwi.nl>, "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: ixml <public-ixml@w3.org>
- Message-ID: <3e587073-8395-4007-8a6a-3574a15470d7@Spark>
Happy New Year! Slightly insincere apologies for not replying sooner: I have been enforcing a policy of snoozing 'work' related emails this holiday season while we've been travelling and visiting family for Christmas and New Year, while dodging the 'Rona. I think that I expressed some concern over committing to rejecting non-conforming grammars: I think that the distinction between 'static' and 'dynamic' errors here is a useful step towards recognising what is feasible and/or possible, and I agree with Michael that we should concentrate on accepting grammars without 'static' errors, and merely report static errors as they occur. Tom _________________ Tomos Hillman eXpertML Ltd +44 7793 242058 On 22 Dec 2021, 19:07 +0000, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>, wrote: > On 22,Dec2021, at 7:43 AM, Steven Pemberton <steven.pemberton@cwi.nl> wrote: > > > I think this should be our philosophical position: > > > ixml provides a way for the author to convert non-XML documents into > > XML. It is up to the author to write ixml so that it produces > > correct XML, and therefore to ensure that: > > > * serialised names are correct XML names, > > * attribute and element content do not contain illegal > > characters, > > * any element does not have more than one attribute of a given name > > and not worry more about these issues within the ixml definition. > > > As to classifications, in my recent mail on this I proposed a 5-way > > split > > > 1. ixml grammar syntax errors > > 2. ixml grammar semantic errors > > 3. ixml grammar correct, input and grammar don't match > > 4. ixml grammar correct, input is ambiguous > > 5. ixml grammar correct, test completes correctly > > > What you are proposing I think is a sixth: > > > 6. ixml grammar correct, test completes correctly, resulting XML is > > in error as a result of authoring errors. > > My answer has grown into two different messages: one about test suites > and one about parsing outcomes and errors. This is the one about > outcomes. > > I think your five-way split makes sense as a rough classification of > outcomes, though I don't understand your second item and I think that > items 4 and 5 belong together. > > The test case tests/expr1.* may lead us to postulate a sixth kind of > outcome, but I don't think I understand the problem well enough to > make such a proposal now, and your description worries me a bit. > > My first concern is with the words "test completes correctly". I am > not confident that the test will complete at all in my processor, > since the well-formedness errors may well cause a run-time exception. > (I haven't yet tried it, so I don't know.) > > My second concern is that I don't know what 'correctly' might mean > here. > > Third, I don't know who the 'author' is who has committed authoring > errors. > > If the "author" here is the writer of the input, then this sounds like > saying that if the input to expr1.ixml produces non-well-formed > output, then the input is not as expected, and the correct > specification of the expected result is to say that the input is not a > sentence in the language defined by the grammar. > > But the input does conform to the grammar as written, interpreted > solely as a context-free grammar: it's just the annotations for XML > serialization that cause the problem. If 'plusop' were marked ^ > rather than @, there would be no problem serializing the result. On > balance, I don't think this is the right analysis. > > If the "author" you have in mind is the writer of the ixml grammar, > then this sounds like saying that a grammar that produces > non-well-formed output is a faulty grammar. If so, then I think the > correct specification of the expected result is to say that expr1.ixml > is not a conforming grammar. > > That analysis does have a wrinkle. If the file tests/expr1.ixml is > not a conforming ixml grammar, then a conforming processor is > currently required to reject it, even though on some inputs > (e.g. "1+2") the problem will not be visible. So we seem to have a > choice: > > - We can say that conforming processors must detect the problem > here, even if the input does not exercise it. > > or > > - We can say that conforming processors are not required after all > to detect and report errors in grammars. > > The same choice arises in connection with checking whether > nonterminals are legal names, though we didn't notice when we were > discussing that topic. > > > Perhaps we do need another category. > > I think the philosophical position you outline here, and what you have > proposed in connection with nonterminals and XML names, amount to > distinguishing two classes of errors: > > (1) errors in grammars we can check for and detect, independently of > any input string, and > > (2) errors that are only detectable (or only readily detectable) given > both grammar and input. > > I'll call these static and dynamic errors for short. > > Since static errors are detectable by examining grammars in isolation, > it's plausible to call them errors in the grammar. > > Since dynamic errors are not required to be detected (or possibly not > detectable) by examining grammars in isolation, we may or may not call > them errors in the grammar, or errors in the input, or something else. > > They are perhaps errors in the grammar: they are failures of the > grammar writer to guarantee that the markings specify well-formed > output for all grammatical inputs. (But we require processors to > reject non-conforming grammars: if non-conformance is not statically > detectable, that requirement is impossible to satisfy.) > > They are perhaps errors in the input. Only, 'error' in a spec like > ours usually means that something is non-conforming. We have > conformance rules for grammars and for processors; input is not > something that can conform or fail to conform to our spec. > > Or perhaps they are best described as dynamic errors or dynamic > exceptions, without pointing either at the grammar or at the input. > They are situations that can arise while parsing input against a > conforming grammar. > > If we do need another category of outcome from a parsing run or test > case (I'm still thinking), I am tempted to say: > > - Such a run-time exception is not an error in the grammar; conforming > grammars can have run-time exceptions on some inputs. (So > processors are not required to detect and reject grammars that could > have run-time exceptions for some inputs.) > > - Such a run-time exception is not a sign that the input is not > grammatical. > > - Conforming processors are required to report such run-time > exceptions and MAY recover from them. (In the case of expr1.*, > recovery might take the form of choosing any one of the 'plusOp' > attribute-value specifications to include and discarding the others, > or it might take the form of ignoring the @ marking on plusOp and > serializing it as an element.) > > Michael > > >
Received on Tuesday, 4 January 2022 12:04:14 UTC