Re: integrating error codes into the test suite and its schema

Norm Tovey-Walsh writes:

>> I will try to respond in more detail later, but for now I'll just say we
>> do seem to have a terminological issue.

> Right. Okay. Let’s see if I can enumerate the possibilities.

On the theory that in the long run the only way to improve communication
is to explain in sometimes tedious detail what one understands words to
mean, I have commented here at greater length than might otherwise be
useful.

> For a given grammar G which defines a language L(G) and an input S:
>
> 0. The user might fail to provide usable inputs. The files might not
>    be found, the URIs might 404, the input might have encoding errors.
>    Let’s say those problems are out-of-scope.

Right.  (Fwiw, a test driver can then record a result of 'not-run' for
those cases, though I suspect mine may sometimes report 'other'.)


> 1. G might be syntactically invalid.

Right.

In the test catalog as I understand it, on a grammar test the expected
result for this case will be assert-not-a-sentence.  (Or, if the test
creator prefers, assert-not-a-grammar can also be used.  The input
grammar is not a sentence in L(G), and it is also not a conforming ixml
grammar.  I lean toward assert-not-a-sentence as slightly more
informative.)

For test sets where the grammar is known faulty, actual test cases are
not usually helpful (or so it seems to me), but if there is a test-case
element in this test set, then the expected result needs to be
assert-not-a-grammar.


> 2. G might be syntactically valid but violate some semantic
>    constraint (e.g., that a nonterminal must have at most one
>    definition).

Right.

In the test catalog, expected results for both a grammar-test and a
test-case element will be assert-not-a-grammar.  


> 3. G might be syntactically and semantically valid.

Right.  In that case, a grammar-test will have an assert-xml or
assert-xml-ref as the expected result.  (Not more than one, I think,
unless we find an ambiguity in the spec grammar.)

> The input S is only relevant in case 3, I think.

Right.


> 4. S might not be in L(G)

Right.

It is this case in which you have puzzled me by using the term 'error'.

For this situation I expect a test-case element to have an expected
result of assert-not-a-sentence.  I do not expect any existing ixml
error code to be applicable, and my gut reaction is to say that if a
processor reports an ixml error code here, it is "in error", both in the
ordinary-language sense that it has made a mistake and in the
spec-drafting sense that it is exhibibiting non-conformant behavior
(ixml error codes should be used only where applicable and only for
errors in the spec-technical sense).  Consequently, I expect the
processor to score a wrong-error result on the test case (assuming that
it does correctly signal that S is not in L(G) -- if it doesn't do that,
it gets a fail, not a wrong-error).

When the processor reports a parsing failure, it will do so however its
designer thinks best, and whether it uses the term 'parsing failure' or
'parse error' or just 'error' is no pressing concern of mine.  But it
does not become an 'error' in our context (writing the spec) because the
processor's messages choose to say 'error' instead of 'condition' or
'exception' or whatever.


> 5. Analysis of S might cause the processor to fail (due to insufficient
>    memory, for example)

Right.

This will never, I guess, be the expected result of a test, but when it
occurs my test harness reports a result of 'fail' in some cases (parsing
raised an uncaught exception in XQuery) and 'other' in others (e.g. an
out-of-time condition).


> 6. S might be in L(G) but produce an invalid serialization.

Right.

The expected result of such a test case should be assert-dynamic-error
(assuming there is just the one error ...), and we will want an @error
attribute with one or more D\d\d values.


> 7. S might be in L(G) and produce a well-formed XML serialization.

Right.


> Does that cover all the cases?

I think so. 

>> I think "error" is not the correct term for all cases in the second
>> class.  If we have to have one, 'failure' might work.  I sometimes refer
>> to 'abnormal termination', which is probably confusing to some or many
>> people.
>
> (Everyone who worked on IBM mainframes at one time, put the return code
> in R15 and say “ABEND”.)
>
> My framing was between tests that are expected to pass (case 7) and
> tests that are expected to fail (cases 1, 2, 4, and 6). I guess that
> boils down to saying “pass” means produces XML output and does not
> signal an error. That doesn’t define a term for the case where
> everything is valid but S is not in L(G), but I don’t think the
> processor produces any XML in that case, so it’s covered.

Er, we have more terminological issues here.  I think 'pass' and 'fail'
in this context mean 'the processor produces the expected output' and
'the processor does not produce the expected output'.

We need some different terms for 'the processor produces a parse tree'
and 'the processor does not produce a parse tree'.  Earlier in this
thread I have flirted with 'normal' and 'abnormal' termination, but
those terms have other strong associations.  So for the moment I think
I'll go with 'produces (or: does not produce) a parse tree', or in the
context of a test case 'has / does not have an expected result of
assert-xml-*'.

> For case 7, an implementation passes by not signaling any errors and
> producing (one of) the expected XML serializations.

I think you are using the term 'error' to mean whatever mechanism a
processor uses to signal that it has not produced a parse tree.  I wish
you wouldn't, but for now I'll just assume that when you say error you
mean 'no-parse signal', not 'failure to conform to the ixml spec'.

> For cases 1, 2, 4, and 6, it passes only if it signals an error and
> either a or b:
>
>   a. The test does not enumerate the acceptable errors. (Or
>      alternatively, explicitly says that any error is acceptable.)
>
>   b. The test does enumerate acceptable errors and the
>      implementation produced one of them.

Here the terminological issue comes to bear.

In a spec like ours, I think an "error" is a failure to conform to the
spec.  The only errors a processor can correctly signal are failures of
the input grammar to conform to the rules for input grammars; these may
be detected statically (S??) or dynamically (D??) -- and for various
reasons the spec does not actually say that dynamic errors are errors in
the grammar (nor what they are errors in, if they are not errors in the
grammar).

Our spec specifies no error codes for case 1 in general, that I know of
(although I think there are some that cover particular cases and that
you and Steven have agreed should be removed).  So I don't know what
ixml error codes a processor could signal for that case.  And any ixml
error code signaled by a processor in that case would be wrong.

If by "signals an error" you mean here "signals that no-parse has been
produced", then yes, but I wish you would not use "error" in that
sense.  


> For cases 1, 2, 4, and 6, if the processor does not signal an error, it
> fails the test. If it fails to produce an acceptable error, it
> wrong-errors the test.

I don't know what this means.

I think it would be right to say that for cases 1, 2, 4, and 6

  - The test catalog should enumerate all applicable ixml error codes.
  
  - If a processor fails to signal that no parse tree was produced, it
    fails the test.

  - If a processor correctly signals that no parse tree was produced
    (and distinguishes between 4 and the others) but either reports an
    inapplicable ixml error code or fails to report an ixml error code
    when at least one is applicable, then it gets a wrong-error score.

  - If a processor correctly signals that no parse tree was produced,
    reports an applicable ixml error code, and refrains from reporting
    any inapplicable ixml error code, then it passes the test.

If you are using 'error' to mean 'signal that no parse tree has been
produced', as I think you are, then the rephrasing I just gave should I
hope look OK to you, if a bit awkward.  If the rephrasing is not
acceptable, then we have deeper communications issues.

> I don’t think there are any tests that only test for 3, but I don’t
> think there need to be.

A grammar test with an assert-xml or assert-xml-ref expected result
seems to me to test for 3.

> I don’t think there are any tests that test for 5, but I don’t think we
> can write any such tests in an implementation independent manner. I’ve
> never caused an out of memory error. I have some grammars that might,
> but I’ve never let them run to completion because I’m not sure there’s
> enough time left before the heat death of the Universe.

Agreed.


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Friday, 27 May 2022 15:21:33 UTC