Re: some questions about version information in the test suite from C. M. Sperberg-McQueen on 2010-06-22 (public-xml-schema-testsuite@w3.org from June 2010)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Mon, 21 Jun 2010 18:39:37 -0600
To: Michael Kay <mike@saxonica.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, public-xml-schema-testsuite@w3.org, Mary Holstege <holstege@mathling.com>
Message-Id: <09798B5F-80B2-46C5-99FC-4DB36260F091@blackmesatech.com>
On 21 Jun 2010, at 15:16 , Michael Kay wrote:

 > I'm getting to the point where uncertainty here is starting to hold
 > up progress.

 > Could I suggest the following as a way of moving forward:

Thank you for this suggestion.  I like it in some ways and not in all
ways; to try to move forward I have made specific counter-proposals
and given examples of (my alternative proposed) usage.

 > Assume that we are testing a processor that claims to implement XSD
 > version P; if we want to test multiple processors or a processor
 > that implements multiple schema versions, repeat the steps below
 > with multiple values of P

I'll call this idea for the basic model 'proposal P'.

Proposal P works, I think, for a flat space of mutually exclusive
versions; it does not seem to me to work well with changes in
editions, with optional features (which to be sure XSD 1.0 and 1.1
don't explicitly admit to having) or with implementation-defined
features with a finite (and small) set of possible values.

As an example of the last, consider the case of complex types whose
content model is an all-group, which are restrictions of some other
content model, and which violate Schema Component Constraint:
Derivation Valid (Restriction, Complex) of section 3.4.6.3.  Three
legal behaviors are distinguished; it is implementation-defined which
of these behaviors a processor implements:

   (a) Processor always detects the violation of the constraint
       by examining the schema in isolation (so it should
       produce result 'invalid' in the schema test).
   (b) Processor detects such violations only when it sees an
       instance of the type valid against the restriction but
       not valid against the base.  (Expected schema test result
       is 'valid', expected instance test result is
       'runtime-schema-error' for a suitable instance,
       and 'valid' or 'invalid' for others.)
   (c) Sometimes one, sometimes the other.

Mary Holstege (whom I'm cc'ing on this since I don't believe she is
subscribed to the test suite comments list) has suggested that the
test suite mechanism needs to be able to handle situations like this;
I think she's right.  I don't want to rush out and complicate the
test suite with a bunch of complex cases, but I want to get the
foundations clean so we can handle complex cases without having to
tear up the floor later.

So as an alternative to proposal P, let me suggest proposal Q,
slightly more complicated than P, able to handle more complex cases
thant P, but equally simple (in fact identical to P) in all the
cases handled by P.

Assume that the distinctions we need to capture are summarized in a
set of binary features.  (These can be thought of as boolean variables
describing a test, an implementation, and/or an expected result.)
We will start with a very simple set and add new features with some
caution since they complicate understanding.  If we started with the
two features

  1.0 = "XSD version 1.0"
  1.1 = "XSD version 1.1"

then proposals P and Q would amount to exactly the same thing.  To
illustrate the more complicated cases, I'll propose beginning with
some more binary features:

  1.0-1e = "XSD 1.0 first edition"
  1.0-2e = "XSD 1.0 second edition"
  CTR-all-compile = "detects violations of 3.4.6.3 by restrictions
    to all-groups by examination of schema in isolation (at
    'compile time')"
  CTR-all-runtime = "detects violations of 3.4.6.3 by restrictions
    to all-groups only when presented with a violating instance
    (at 'run time')"
  CTR-all-idep = "implementation-defined behavior with respect to
    detecting violations of 3.4.6.3 by restrictions to all-groups:
    may detect them at compile time, may detect them at run time"
  XML1.0 = XML-dependent datatypes based on XML 1.0 first through
    fourth editions (Not sure what to do about XML 1.0 fifth edition,
    ignoring it for now; when we have some tests in that area we
    may know more about whether we need a separate feature.)
  XML1.1 = XML-dependent datatypes based on XML 1.1

These binary features can be used to describe (a) implementations, (b)
conditions for the meanginfulness of tests, and (c) conditions for the
given expected results.  They are not completely independent (for
example, every 1.1 processor should according to the spec implement
exactly one of CTR-all-compile, CTR-all-runtime, or CTR-all-idep), but
the logical constraints are expressed only in the prose where these
tokens are defined (namely in the appropriate type of the schema for
test suite metadata).  The prose should group the features into
mutually exclusive sets; any implementation will, in any particular
configuration, support at most one feature in each set.  (Many
implementations will in fact support multiple features from the same
set and allow run-time choice among them.)  For the features mentioned
above,

   1.0 and 1.1 are mutually exclusive.
   1.0-1e and 1.0-2e are mutually exclusive.
   The three CTR-all-* features are mutually exclusive.
   XML1.0 and XML1.1 are mutually exclusive.

Each implementation can be regarded as implementing (or: supporting) a
particular set C of these features; implementations which can run in
multiple configurations implement multiple sets of features; for such
implementations, run the test suite for multiple configurations C.
For example:

   - A 1.0 processor implements "1.0" and either "1.0-1e" or
     "1.0-2e".
   - A 1.1 processor implements "1.1" and exactly one of
     CTR-all-compile, CTR-all-runtime, CTR-all-idep.
   - A 1.1 processor may implement XML1.0, XML1.1, or both, but
     any configuration used in a validation episode will use
     the XML1.0 feature or the XML1.1 feature, not both.

Each test suite, test set, test group, and test can be regarded as
meaningful for processors supporting, and meaningless for processors
not supporting, particular features.

Each expected test result applies in the presence of some set of
features.

 > (a) The version attribute can appear on testSuite (replacing
 > schemaVersion), testSet, testGroup, schemaTest, or instanceTest. In
 > each case the semantics are "if the attribute is present and
 > includes the value P, then run this test / these tests; if it is
 > present but does not include P, then skip this test. If the version
 > attribute is absent, then run the test".

Proposal Q is very similar.  I'll stick with 'version' as the name of
the attribute for now.

The version attribute can appear on testSuite, testSet, testGroup,
schemaTest, or instanceTest.  In each case the semantics are

   "If the attribute is present and includes the name of any feature
   supported by the implementation being tested, then run the test(s).
   Conversely, if every feature listed is unsupported, skip the
   test(s).  If the attribute is not given or lists no features, run
   the test(s)."

So for test suites, sets, groups, and tests,

   version="x y z"

means that any implementation configuration C which supports any of
the features x, y, or z should run the tests.  If C only supports w,
v, and u, then it should skip the tests.  More examples:

   version="1.0 1.1" // both 1.0 and 1.1 processors should run
   version="XML1.1" // only processor configurations supporting
                    // the 1.1-based datatypes (NOT the same as
                    // supporting XML 1.1 input, by the way) should
                    // bother to run these tests.
   version="1.0-1e" // test is relevant only for implemtations of 1.0 1E

 > (b) The version attribute can appear on <expected>, meaning "these
 > are the expected results for a processor at a given version". The
 > test driver should find the <expected> element that either has no
 > version attribute or whose version attribute includes P. There must
 > be exactly one such element.

Proposal Q: The version attribute can appear on <expected>, meaning
"these are the expected results for any processor supporting all of
the features mentioned." The test driver should find the <expected>
element that either has no version attribute or whose version
attribute includes features all of which are supported in the current
configuration C. There must be at most one such element.

Note: The last rule is most easily satisfied by ensuring that in any
given pair of <expected> elements for the same test, each has a
version attribute which mentions some feature incompatible with some
feature mentioned in the other element's version attribute.

Note: it's a logical error (in theory enforceable by an assertion, but
probably not checked by the schema for test suite metadata) to have
mutually exclusive feature tokens in the same version value.

Examples:

   <expected version="1.0" .../> // 1.0 only
   <expected version="1.1" .../> // 1.1 only
   <expected .../>               // both 1.0 and 1.1

   <expected version="1.0 1.1" .../>
      // Contradiction: this means both 1.0 and 1.1 support
      // in the same run.  Not logically possible.

   <expected version="1.1 CTR-all-compile" .../>
      // expected results for 1.1 processors which claim
      // compile-time detection of schema errors in the case
      // of restriction to all-groups.

   <expected version="CTR-all-compile" .../>
      // expected results for processors which claim
      // compile-time detection of schema errors in the case
      // of restriction to all-groups.  (In practice, such
      // processors will probably all be 1.1 processors, but
      // we don't have to say so here.)


 > (c) Change the value of <expected> to be a list of permitted
 > outcomes, where any of the listed outcomes is considered to pass the
 > test. For example

 >  <expected version="1.0">invalid notKnown</expected>
 >  <expected version="1.1">invalid</expected>

 > indicates that acceptable results for 1.0 are "invalid" and
 > "notKnown", whereas for 1.1 the only acceptable result is "invalid"
 > This replaces the impDe attribute.

This is orthogonal to the differences between what I've been calling
proposal P and proposal Q and I'd like to decide it separately.

I support eliminating the implDe attribute but believe I would prefer
an alternative like that given in the 'expected-outcome' type of

http://www.w3.org/XML/2008/xsdl-exx/ancillary/xsts-schema.sketch.xml#type_expected-outcome

namely: allow the expected outcomes 'valid', 'invalid', 'notKnown',
'implementation-defined', 'implementation-dependent', 'indeterminate',
'invalid-latent', and 'runtime-schema-error'.

The reasons I'm conscious of are:

   - It's helpful for the WG and for users of the spec (also
     implementors) to distinguish implementation-defined or
     -dependent behavior from cases where the spec is just
     vague or unclear or contradictory.
   - Allowing some non-empty subset of 'valid', 'invalid', and
     'notKnown' suggests that all combinations are possible
     and distinct.  If there can really be cases where
     'valid' and 'notKnown' are possible but not 'invalid',
     or 'valid' and 'invalid' but not 'nowKnown', then maybe
     this is a better approach.  But my immediate guess is
     that all such values are errors for 'valid invalid
     notKnown'.

 > As a matter of test suite design, the aim should be to put the
 > version attribute on the smallest element possible.

I like where this goes, but don't you mean 'the largest element
possible'?  If a whole test set is really 1.1 only, put version="1.1"
on the testSet element.  If that's not possible, mark the individual
test groups, or individual tests.

 > Where XSD 1.0 and XSD 1.1 exhibit interestingly different behaviors,
 > there should be expected results for both. However, where the test
 > uses XSD 1.1 syntax that will be rejected out of hand by an XSD 1.0,
 > processor, there is little point in making XSD 1.0 processors run
 > the test; it should simply be marked as version="1.1" typically at
 > the testGroup or testSet level, rather than giving expected results
 > for 1.0.

As regards schema tests, I'd be more enthusiastic about that proposal
if there hadn't been a report on this list that a 1.0 processor with a
good reputation had accepted a 1.1 schema with a negative wildcard as
a legal schema, presumably because it just ignored the notQName
attribute.

In principle, I agree that the schema tests involving constructs valid
in a 1.1 schema document and invalid in a 1.0 schema document are not
interesting; it is only in practice that they turn out to be
interesting.

I can accept the principle, however, if we can agree explicitly that
any schema test marked as 1.1-only has the expected outcome
'non-conforming' (or 'invalid') for any 1.0 processor.  1.0
implementors who want to check their processors strictness in
detecting invalid schema documents can run the 1.1 tests; in the
normal course of events 1.0 maintenance developers will skip the 1.1
tests.

So OK, I think I've talked myself into agreeing on this point,
contrary to what I thought before.



-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************
Received on Tuesday, 22 June 2010 00:40:09 UTC