Re: some questions about version information in the test suite from C. M. Sperberg-McQueen on 2010-06-18 (public-xml-schema-testsuite@w3.org from June 2010)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Fri, 18 Jun 2010 12:36:14 -0600
To: Michael Kay <mike@saxonica.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, public-xml-schema-testsuite@w3.org, Henry Thompson <ht@inf.ed.ac.uk>
Message-Id: <E17EE8A9-A6A8-43B6-88AC-2978B27455F0@blackmesatech.com>
On 18 Jun 2010, at 10:24 , Michael Kay wrote:

>
> Conceptually I think we have a set of triples (I, C, O) where I is  
> the input to the test, C is the conditions/configuration under which  
> it should be run, and O is the expected output. Perhaps we are  
> debating whether or not it is useful to group these by common I.

Yes, I think that's so.

One complication, perhaps:  I think that in reality we have
a 4-tuple (I, A, C, O) where I is the input, A is a set of
applicability conditions (preconditions for the test producing
a meaningful result), C is a set of conditions under which
it's run, and O is the prescribed output when the test is
applicable and run under those conditions.

In many simple cases, A and C are related or perhaps identical,
but conceptually I think it's an important distinction, which
can be illustrated by (a) a schema test involving a wildcard
with notQName="##definedSibling" and (b) an instance test in
the same test group.

Test (a) is perfectly meaningful for both a 1.0 and a 1.1
processor:  for a 1.0 processor, using the standard
XML-to-component mappings (which is what I assume
the test suite requires), the schema document in test (a)
does not define a conforming schema, while for a 1.1 processor
the schema does conform.

The instance test (b), by contrast, is meaningful only for a
1.1 processor, because the 1.0 processor has no schema to
validate against.

(This example may be atypical in that this line of reasoning
applies to ALL test groups and configurations:  if any processor
believes the schema to be non-conforming, the instance tests
are ipso facto meaningless for that processor.  But the example
does illustrate, I think, the difference I have in mind between
applicability or meaningfulness of a test on the one hand,
and conditions defining the expected result on the other.


> I think the test driver is a little simpler if we don't (i.e. if we  
> keep it flat) and reporting/analyzing results might also be a bit  
> easier that way, while maintenance of the test suite is perhaps a  
> little simpler if it is grouped.

Agreed, with the proviso that the difference in simplicity looks
marginal to me.  The real advantage of the grouped structure
seems to me to lie in being able to illuminate the changes to
the spec better.

> What's a muddle, though, is if we do a bit of both.
>
> The flat model would be along the lines
>
> <test>
> <input>...</input>
> <conditions>...</conditions>
> <result>...</result>
> </test>
>
> <test>
> <input>...</input>
> <conditions>...</conditions>
> <result>...</result>
> </test>
>
> The grouped model would be more like
>
> <test>
> <input>...</input>
> <outcome>
> <conditions>...</conditions>
> <result>...</result>
> </outcome>
> <outcome>
> <conditions>...</conditions>
> <result>...</result>
> </outcome>
> </test>
>
> (Of course, if the condition is simple it can be made an attribute  
> of <result> and outcome can then be collapsed; but I'd suggest  
> avoiding that, because the conditions can become complex over time).

Quite right:  if the only conditions we need to express are
"1.0" and "1.1", it's one thing -- but even thinking about how
to handle 1.0 First Edition vs. 1.0 Second Edition makes the
entire thing seem much more complicated.

If the conditions we end up needing, however complex they are,
turn out to be reused over and over (and I expect this to be so,
without knowing why or being able to cite examples), I wonder if
it would be worth while factoring out the definitions of
the conditions somehow, so that:

   - We assign single-token names to possibly complex conditions.
   - The applicability conditions for a test or an expected result
     are expressed as one or more of these single-token names,
     with either the interpretation "applicable if all of these
     apply" or "applicable if any of these apply" -- not sure
     which would make sense in practice.
   - A given test harness need to know which keywords mean, for
     it, "run this" or "don't run this" (and in some cases what
     effect they have on the parameters to be passed to the processor);
     in many cases this will require manual setup.

To take a simple example, imagine the following keywords with the
following meanings:

   - 1.0 = (for processors) supports XSD 1.0, latest edition
           (for tests) relevant to / expected by 1.0, all editions
   - 1.0-1e = (for processors) supports the language defined in
           1.0 first edition
           (for tests) relevant to / expected by 1.0 1E
   - 1.0-2e = (for processors) supports the language defined in
           1.0 second edition
           (for tests) relevant to / expected by 1.0 2E
   - 1.1 = (for processors) supports XSD 1.1, current draft
           (for tests) relevant to / expected by XSD 1.1

Then a test harness for 1.0 processor that is being maintained should
run tests whose applicability is labeled "1.0" or "1.0-2e", and produce
the results labeled with those labels.  And if and when a 3E is
produced, both the test suite and the test harness would need to
be updated to make sure the appropriate labels are present and
that they are handled correctly.  A test harness for a 1.0
processor which hasn't been touched since 2003 and which claims
conformance only to 1.0 First Edition would run anything with the
keywords "1.0" or "1.0-1e", but would not touch "1.0-2e".  And so on.



On the structure:  for the conditions on results, I keep thinking
about nested structures roughly analogous to choose/when structures.
I think they seem attractive to me because the information which
only applies under certain conditions is wrapped in markup that
expresses those conditions.  If we used keywords with an 'or'
semantics, then one could write

   <test>
     <input>
     <outcome>
       <when test="1.0-1e"> <result/> </when>
       <when test="1.0-2e 1.1"> <result/> </when>
     </outcome>
   </test>

Or if we end up with 'and'-semantics (and optional features called
'red', 'green', and 'Christmas'):

   <test>
     <input>
     <outcome>
       <when test="1.0-1e"> <result/> </when>
       <when test="1.0-2e"> <result/> </when>
       <when test="1.1 red"> <result/> </when>
       <when test="1.1 Christmas"> <result/> </when>
     </outcome>
   </test>

> I've abstracted away from detailed questions about what <test> is:  
> but in practice this is relevant because we have a testGroup  
> containing a number of tests each with its own outcome, and this  
> makes it a bit difficult to use the grouped structure if the same  
> conditions apply to all tests within the group.

And our test groups have the addition complication that they all
share a common schema.


> I think it's this complexity that tends to lead me to the "flat"  
> structure, where the conditions (including of course the version)  
> are always written at the level of the <testGroup> element.

If we were to atomize things sufficiently, a really flat
structure could look like

  <test input="x.xsd y.xsd z.xml" applicable-when="1.0" prescribed- 
result="indeterminate"/>
  <test input="x.xsd y.xsd z.xml" applicable-when="1.1" prescribed- 
result="invalid"/>

which would have the advantage of being easy to group (or
search) for tests with the same input and different results,
or the same applicability conditions.

I don't really want to suggest this, at least not without some
way to group tests, because I at least often draft tests in
groups that can share common documentation and description.

But I do think that a 'flat' design would be less unattractive
to me if it were really flat.  In the old vocabulary design,
each instance test has one instance document and one expected
result, so that in a typical instance test

   <instanceTest version="1.0" name="e1ie1i.xml">
    <instanceDocument xlink:href="../wgData/sg/e1ie1.xml"/>
    <expected validity="invalid"/>
   </instanceTest>

more than half of the characters are contributing no information
whatever, and

   <instance name="e1ie1i.xml"
             conditions="1.0"
             doc="../wgData/sg/e1ie1.xml"
             outcome="invalid"/>

would convey all the same information.

Are we contemplating the possibility of such large structural
changes?  Or are we only using these sketches of possible syntax
as ways to clarify the conceptual structures we need, so that
we can design a syntax that allows the existing metadata to
remain applicable? (Of course, there are only forty-five or
so testSet documents; if we made radical structural changes to
the metadata syntax, it would not be hard to translate them
all.)

I'm open to structural changes, but I'm assuming for now that
our initial focus should be on getting the concepts right,
so we can worry about syntax later.


-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************
Received on Friday, 18 June 2010 18:36:46 UTC