integrating error codes into the test suite and its schema

Test-suite fans,

We now have a set of prescribed error codes in the spec (hurrah, my
experience with XSLT and XQuery makes me think this can be very helpful,
even if experience with some other specs makes me conscious that it's
not a magic wand).

We should integrate them into the test suite.  But how?

I think three questions are involved; I have suggestions.


Q1 How should grammar-test and test-case elements specify the expected
error codes?

Suggestion 1a: with an attribute on the assertion(s).

So in tests/error/test-catalog, test set hex-much-too-big would have a
grammar test reading something like

    <grammar-test>
      <result>
	<assert-not-a-grammar ixml:error="S07"/>
      </result>
    </grammar-test>

and the test case in that test set would read:

    <test-case name="hex-much-too-big">
      <test-string/>
      <result><assert-not-a-grammar ixml:error="S07"/></result>
    </test-case>


Q2 What if more than one error code appears to be applicable?

Consider these error codes in the staged spec of 24 May:

    D01 It is an error if the parse tree produced by a grammar cannot be
    represented as well-formed XML.
    
    D02 It is an error if two or more attributes with the same name
    would be serialized on the same element.

If the parse tree produced by a grammar would serialize two or more
attributes with the same name on the same element, then the following
statements all seem to me to be true.

    (a) The condition described under D02 applies.

    (b) Error code D02 is apposite.

    (c) The parse tree produced by the grammar cannot be represented as
    well-formed XML.
    
    (d) The condition described under D01 applies.

    (e) Error code D01 is apposite.

In most cases, I expect implementations will want to return D02 and most
users will want to see D02 since it is more informative than D01.  But I
find it hard to say an implementation is wrong if it returns D01.

(Full disclosure: the current version of Aparecium signals both: it
signals D02 for each attempt to serialize an attribute whose name has
already been used on the element in question, and also signals a D01 at
the top of the result if there are instances of D02 or other
well-formedness errors within the result.)

Suggestion 2a: In the expected result, specify all applicable error
codes, with the meaning "one or more of these is expected".

So test case tests/correct/test-catalog.xml#expr1#expr1 (that's test
catalog tests/correct/test-catalog.xml, test set ixml, test case ixml)
could be written with the following expected result:

        <result>
          <assert-dynamic-error ixml:error="D01 D02"/>
        </result>

Suggestion 2b: If suggestion 2a is adopted, the order of codes given
carries no significance; they are a set.

In practice, we know full well that D02 is preferable, but attempting to
make the order matter is a quick ticket to a lot of questions we do not
want to have to answer.  If individual developers want more detailed
information along quality-of-diagnostics lines, they are welcome to
supply implementation-dependent extension attributes or app-info
elements in the test catalogs.  When the rest of us see how great the
idea is, we'll adopt it, too.


Q3 How do the codes affect reported results?

I assume that for most purposes we all agree that if the expected result
is assert-not-a-sentence with some specified error code(s), then it's
best if the processor reports a parse failure and reports a correct
error code, but it's better to report the wrong error code (or none)
than to fail to notice the parsing failure.

So the current taxonomy of test results needs to change.  Currently it
has four values:

	result-type = 'pass' | 'fail' | 'not-run' | 'other'

Either the middle case (right end result, wrong code) needs a new value,
or we need to be clear that 'pass' may include results with the wrong
error code, or we need to be clear that 'fail' may include results which
are wrong only in not providing the prescribed diagnostic.

When we were discussing error codes, Norm argued (if I understood him
correctly) that in the XQuery/XSLT case getting the error codes wrong
was more often a symptom of a bug in a processor than a symptom of a
different analysis of what was wrong with the input.  So I don't want to
treat wrong-code results as passes.  And for my own self-esteem, I don't
want them all of my test-case passes to turn to failures the moment we
add error codes to the catalogs.

Suggestion 3a:  add 'half-pass' as a result-type value.

Suggestion 3b: A 'pass' result means (a) that the processor got the
right result (correct XML, or correct report that the one input was not
a conforming grammar, or correct report that the input string is not a
sentence in the language of the input grammar), and (b) in the case of
assert-not-a-grammar, assert-not-a-sentence, and assert-dynamic-error,
the processor supplied one or more of the specified error codes (if any
are specified) and no other code.

Suggestion 3c: A 'half-pass' result means (a) as above but not (b): got
the right overall result but failed to specify an expected error code,
or specified an unexpected error code.

Suggestion 3d: In suggestions 3b and 3c, then for the case where error
codes are expected and error codes are reported, the semantics of "pass"
are essentially those of the XPath 3.0 expression

    every $rec in $reported-error-codes[is-ixml-error-code(.)]
    satisfies ($rec = $expected-error-codes)

and the semantics of half-pass are the negation of that test.

For the full range of possibilities, it gets more complicated.

    (: If error codes are both reported and expected, then all reported
       codes must be expected but not vice versa. :)
       
    if (exists($reported-error-codes)
       and exists($expected-error-codes)
    then every $rec in $reported-error-codes[is-ixml-error-code(.)]
         satisfies ($rec = $expected-error-codes)

    (: If error codes are reported and but not expected, then all
       reported codes must be non-ixml codes. :)

    else if (exists($reported-error-codes)
       and empty($expected-error-codes))
    then every $rec in $reported-error-codes
         satisfies not(ixml-error-code($rec))

    (: If error codes are expected but not reported, then nogo. :)

    else if (empty($reported-error-codes)
       and exists($expected-error-codes))
    then false() (: not a pass :)

    (: If error codes are neither expected nor reported, then pass. :)

    else if (empty($reported-error-codes)
       and empty($expected-error-codes))
    then true() (: pass :)

    else error('logic error in test, call the programmer')
   
and the semantics of "half-pass" are that the correct overall result was
achieved (not-a-sentence, not-a-grammar, dynamic-error) but the
combination of expected and reported error codes did not match the
conditions for a pass.

    Note: The pseudo-code above assumes that the absence of an
    error-code attribute on the expected result means none are
    expected. In practice, we'll want to distinguish the none-expected
    situation from the 'we have not yet gotten around to this test case'
    situation, so I expect 'none' will be a legitimate value for the
    ixml:error-code attribute.

Note in particular that if we have

    <assert-dynamic-error ixml:error="D01 D02"/>

a processor passes by reporting either error D01 or error D02 or both,
and no other ixml errors.  (It may report implementation-defined
errors.)

There are no extra points for using all the error codes specified, there
are no penalties for using just one.  There *are* penalties for using an
unexpected ixml error code.


If I can get working consensus on this, I will update the schema for
test catalogs in ixml/schemas to require ixml:error attributes on
appropriate elements.  The value 'none' will mean that there no relevant
ixml codes: e.g. for bad grammar syntax.  Unless someone objects, I will
define 'working consensus' as one affirmative assent and no objections,
and objectors will have 3 days from today to object, unless there is no
assent during that period, in which case the first reaction will win.

After that, I propose that we systematically update the test catalogs to
provide ixml:error attribute on all assertions involving abnormal
termination, and that at the same time we update the test catalogs to
use the new namespace.  So old namespace = not-updated, new namespace =
updated.


Comments and counter-proposals welcome.

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Tuesday, 24 May 2022 18:51:04 UTC