Re: Test harness results xml format from Philip Taylor on 2010-07-27 (public-html-testsuite@w3.org from July 2010)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Tue, 27 Jul 2010 11:39:19 +0100
To: James Graham <jgraham@opera.com>
CC: Kris Krueger <krisk@microsoft.com>, "'public-html-testsuite@w3.org' (public-html-testsuite@w3.org)" <public-html-testsuite@w3.org>
Message-ID: <4C4EB757.1090806@cam.ac.uk>
James Graham wrote:
> On 07/27/2010 03:28 AM, Kris Krueger wrote:
>> As discussed at the last meeting this is the format of what the test 
>> harness could expose as xml.
>> This could be used to  build an implementation report, if you have 
>> feedback (add data, remove data) please speak up.
> 
>> <?xml version="1.0" encoding="utf-8"?>
>> <testRun version="1.0">
>>    <userAgent>9.0.101101</userAgent>
>>    <browserName>Internet Explorer</browserName>
>>    <Date>07-26-2010</Date>
>>    <Submitter>Microsoft Test Team</Submitter>
>>    <targetTestSuiteName>W3C HTML5 Test Suite Ver 
>> 1.0</targetTestSuiteName>
>>    <targetTestSuiteID>W3CHTML5-10</targetTestSuiteID>
>>    <Tests>
>>      <Test Uri="/w3c/tests/html5/Chapt2/sect2.1.1.3">
>>        <featureName>Canvas</featureName>
>>        <testName>Object literal - Get Set property</testName>
>>        <specReference>7.2, 7.5</specReference>
>>        <Result>Pass|Fail|Not Implemented</Result>
>>      </Test>
>>      <Test Uri="/w3c/tests/html5/Chapt2/sect2.1.1.4">
>>        <featureName>Canvas</featureName>
>>        <testName>Object literal - Get Set property</testName>
>>        <specReference>7.2, 7.5</specReference>
>>        <Result>Pass|Fail|Not Implemented</Result>
>>      </Test>
>>    </Tests>
>> </testRun>
> 
> I'm not sure what the use cases / requirements we are trying to address 
> here are, so it is impossible for me to judge what is needed from the 
> above. That said, I would a-priori prefer a simpler json format, like 
> (in some mixture of json and pseudo-bnf):
> 
> {
> "ua":ua-id,
> "results":{(test-id:result)*}
> }

It seems sensible to avoid providing the 
featureName/testName/specReference in the the test results, because 
that's duplicating information that can already be derived from the test 
id and revision number - it's denormalising the data and introducing 
scope for errors. A simple script can join the minimal test result data 
against the test repository to generate a detailed status report.

For http://philip.html5.org/tests/canvas/suite/tests/results.html I used 
a results format like:

- ua: "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; 
SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media 
Center PC 6.0)"
   results:
   - id: 2d.composite.solid.source-over
     status: PASS
     notes: '%3CLI%3EPassed%3C/LI%3E'
   - id: 2d.composite.solid.destination-over
     status: FAIL
     notes:
'%3CLI%3EFailed%20assertion%3A%20got%20pixel%20%5B255%2C255%2C0%2C255%5D%20at%20%2850%2C25%29%2C%20expected%20%5B0%2C255%2C255%2C255%5D%20+/-%205%3C/LI%3E'

(It uses YAML syntax, which is probably not a good idea. But the syntax 
is the least important issue for the results format.)

(But on the subject of syntax, I've seen one developer want a diff of 
the test results from two different versions of a browser, to see what's 
changed. It would be good if a normal line-based diff over the raw 
results data could do this, i.e. each test case should be on a separate 
line.)

In particular my results format includes the textual output of the test 
case ("<LI>Failed assertion: got pixel [255,255,0,255] at (50,25), 
expected [0,255,255,255] +/- 5</LI>" etc), which seems quite helpful for 
debugging tests and for debugging browsers - e.g. I can read across a 
row in the results table and if every browser fails in the same way, 
there may be a problem in the test or in the spec; or if the message is 
"got pixel [0,253,0,255], expected [0,255,0,255]" then I can see it's 
not a high priority to investigate; otherwise it's a failure I need to 
look at more closely in the browser.

So I think it's useful to include this information, not merely a 
pass/fail indicator, so that the test results can guide debugging.

One problem I've had with my test results is that I sometimes change a 
few test cases, but don't want to recompute the whole results set for 
every browser, so the results table shows slightly misleading data. It 
would be helpful for the results to say precisely what version of the 
tests they were running. In particular, they could say the Hg revision 
of the tests - then an offline results-table-generating tool can easily 
work out which test cases have been changed since that revision and mark 
them as 'did not run' instead of an outdated pass/fail. (Using the Hg 
revision means incremental changes to the test suite are easy - we don't 
have to worry about manually bumping a version number or about 
invalidating all the existing test results.)

-- 
Philip Taylor
pjt47@cam.ac.uk
Received on Tuesday, 27 July 2010 10:39:49 UTC