- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Fri, 18 Jun 2010 12:36:14 -0600
- To: Michael Kay <mike@saxonica.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, public-xml-schema-testsuite@w3.org, Henry Thompson <ht@inf.ed.ac.uk>
On 18 Jun 2010, at 10:24 , Michael Kay wrote: > > Conceptually I think we have a set of triples (I, C, O) where I is > the input to the test, C is the conditions/configuration under which > it should be run, and O is the expected output. Perhaps we are > debating whether or not it is useful to group these by common I. Yes, I think that's so. One complication, perhaps: I think that in reality we have a 4-tuple (I, A, C, O) where I is the input, A is a set of applicability conditions (preconditions for the test producing a meaningful result), C is a set of conditions under which it's run, and O is the prescribed output when the test is applicable and run under those conditions. In many simple cases, A and C are related or perhaps identical, but conceptually I think it's an important distinction, which can be illustrated by (a) a schema test involving a wildcard with notQName="##definedSibling" and (b) an instance test in the same test group. Test (a) is perfectly meaningful for both a 1.0 and a 1.1 processor: for a 1.0 processor, using the standard XML-to-component mappings (which is what I assume the test suite requires), the schema document in test (a) does not define a conforming schema, while for a 1.1 processor the schema does conform. The instance test (b), by contrast, is meaningful only for a 1.1 processor, because the 1.0 processor has no schema to validate against. (This example may be atypical in that this line of reasoning applies to ALL test groups and configurations: if any processor believes the schema to be non-conforming, the instance tests are ipso facto meaningless for that processor. But the example does illustrate, I think, the difference I have in mind between applicability or meaningfulness of a test on the one hand, and conditions defining the expected result on the other. > I think the test driver is a little simpler if we don't (i.e. if we > keep it flat) and reporting/analyzing results might also be a bit > easier that way, while maintenance of the test suite is perhaps a > little simpler if it is grouped. Agreed, with the proviso that the difference in simplicity looks marginal to me. The real advantage of the grouped structure seems to me to lie in being able to illuminate the changes to the spec better. > What's a muddle, though, is if we do a bit of both. > > The flat model would be along the lines > > <test> > <input>...</input> > <conditions>...</conditions> > <result>...</result> > </test> > > <test> > <input>...</input> > <conditions>...</conditions> > <result>...</result> > </test> > > The grouped model would be more like > > <test> > <input>...</input> > <outcome> > <conditions>...</conditions> > <result>...</result> > </outcome> > <outcome> > <conditions>...</conditions> > <result>...</result> > </outcome> > </test> > > (Of course, if the condition is simple it can be made an attribute > of <result> and outcome can then be collapsed; but I'd suggest > avoiding that, because the conditions can become complex over time). Quite right: if the only conditions we need to express are "1.0" and "1.1", it's one thing -- but even thinking about how to handle 1.0 First Edition vs. 1.0 Second Edition makes the entire thing seem much more complicated. If the conditions we end up needing, however complex they are, turn out to be reused over and over (and I expect this to be so, without knowing why or being able to cite examples), I wonder if it would be worth while factoring out the definitions of the conditions somehow, so that: - We assign single-token names to possibly complex conditions. - The applicability conditions for a test or an expected result are expressed as one or more of these single-token names, with either the interpretation "applicable if all of these apply" or "applicable if any of these apply" -- not sure which would make sense in practice. - A given test harness need to know which keywords mean, for it, "run this" or "don't run this" (and in some cases what effect they have on the parameters to be passed to the processor); in many cases this will require manual setup. To take a simple example, imagine the following keywords with the following meanings: - 1.0 = (for processors) supports XSD 1.0, latest edition (for tests) relevant to / expected by 1.0, all editions - 1.0-1e = (for processors) supports the language defined in 1.0 first edition (for tests) relevant to / expected by 1.0 1E - 1.0-2e = (for processors) supports the language defined in 1.0 second edition (for tests) relevant to / expected by 1.0 2E - 1.1 = (for processors) supports XSD 1.1, current draft (for tests) relevant to / expected by XSD 1.1 Then a test harness for 1.0 processor that is being maintained should run tests whose applicability is labeled "1.0" or "1.0-2e", and produce the results labeled with those labels. And if and when a 3E is produced, both the test suite and the test harness would need to be updated to make sure the appropriate labels are present and that they are handled correctly. A test harness for a 1.0 processor which hasn't been touched since 2003 and which claims conformance only to 1.0 First Edition would run anything with the keywords "1.0" or "1.0-1e", but would not touch "1.0-2e". And so on. On the structure: for the conditions on results, I keep thinking about nested structures roughly analogous to choose/when structures. I think they seem attractive to me because the information which only applies under certain conditions is wrapped in markup that expresses those conditions. If we used keywords with an 'or' semantics, then one could write <test> <input> <outcome> <when test="1.0-1e"> <result/> </when> <when test="1.0-2e 1.1"> <result/> </when> </outcome> </test> Or if we end up with 'and'-semantics (and optional features called 'red', 'green', and 'Christmas'): <test> <input> <outcome> <when test="1.0-1e"> <result/> </when> <when test="1.0-2e"> <result/> </when> <when test="1.1 red"> <result/> </when> <when test="1.1 Christmas"> <result/> </when> </outcome> </test> > I've abstracted away from detailed questions about what <test> is: > but in practice this is relevant because we have a testGroup > containing a number of tests each with its own outcome, and this > makes it a bit difficult to use the grouped structure if the same > conditions apply to all tests within the group. And our test groups have the addition complication that they all share a common schema. > I think it's this complexity that tends to lead me to the "flat" > structure, where the conditions (including of course the version) > are always written at the level of the <testGroup> element. If we were to atomize things sufficiently, a really flat structure could look like <test input="x.xsd y.xsd z.xml" applicable-when="1.0" prescribed- result="indeterminate"/> <test input="x.xsd y.xsd z.xml" applicable-when="1.1" prescribed- result="invalid"/> which would have the advantage of being easy to group (or search) for tests with the same input and different results, or the same applicability conditions. I don't really want to suggest this, at least not without some way to group tests, because I at least often draft tests in groups that can share common documentation and description. But I do think that a 'flat' design would be less unattractive to me if it were really flat. In the old vocabulary design, each instance test has one instance document and one expected result, so that in a typical instance test <instanceTest version="1.0" name="e1ie1i.xml"> <instanceDocument xlink:href="../wgData/sg/e1ie1.xml"/> <expected validity="invalid"/> </instanceTest> more than half of the characters are contributing no information whatever, and <instance name="e1ie1i.xml" conditions="1.0" doc="../wgData/sg/e1ie1.xml" outcome="invalid"/> would convey all the same information. Are we contemplating the possibility of such large structural changes? Or are we only using these sketches of possible syntax as ways to clarify the conceptual structures we need, so that we can design a syntax that allows the existing metadata to remain applicable? (Of course, there are only forty-five or so testSet documents; if we made radical structural changes to the metadata syntax, it would not be hard to translate them all.) I'm open to structural changes, but I'm assuming for now that our initial focus should be on getting the concepts right, so we can worry about syntax later. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Friday, 18 June 2010 18:36:46 UTC