Unconference topic suggestion: Conformance checker tests from Henri Sivonen on 2007-10-31 (public-html@w3.org from October 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 31 Oct 2007 21:37:33 +0200
To: HTML WG <public-html@w3.org>
Message-Id: <EA33A212-E7F3-46E2-8C31-BAD37EA2C742@iki.fi>
This has been discussed on IRC a few times. I'd like to suggest the  
topic to be visited in the f2f:

Test case collaboration for testing conformance checkers: How can  
implementation-independent conformance test cases be produced as a  
community effort?

Some random points:
  * Validator.nu does not undergo as much systematic testing as would  
be proper.
  * It would be good to have test cases that aren't tied to a  
particular implementation. (This is harder than it first appears.)
  * Boolean tests are easy: A test case should elicit either a  
validation failure or pass as documented in test meta data.
  * It would be good for tests developed for browser testing to carry  
metadata annotation that tells if a given document is supposed to be  
conforming or not. This would allow browser tests to be used for  
conformance checker stress testing.
  * A test suite should not require conformance checkers to identify  
particular error conditions by an id defined by the test suite,  
because this would limit possible implementation approaches for  
conformance checkers.
    - Example: A test suite should not require a conformance checker  
to report error 123 when a figure element lacks an embedded content  
child, because grammar-based implementations don't have a reasonable  
way to map derivation failures to arbitrary numbers.
  * A test suite should not require counting error to more precision  
than 0 or more than 0.
    - Example: When there's a misplaced element, grammar-based  
implementation may resync in implementation-specific ways.
    - Example: Implementations should have the freedom to optimize  
situations where the tree construction spec causes a single input  
artifact to hit multiple parse errors.
    - Example: Implementations should have the freedom to suppress  
further errors e.g. from an entire misrooted subtree when there's a  
reason to believe that one error would otherwise cause a lot of user- 
unfriendly secondary errors.
  * Even though I think that centrally-defined error ids are an  
unreasonable limitation on implementation strategies (they are  
grammar-hostile, in particular), I think it is quite reasonable for  
test cases to come with a human-readable note about what kind of  
error they are supposed to elicit.
  * Error locations are not unambiguous. Some implementations may  
point to an *approximate* single character. Others may identify a  
source range, such as an entire tag. Yet others might identify a  
range occupied by a single attribute that is in error.
  * To avoid issues with counting counting column positions in UTF-16  
code units as opposed to Unicode characters, test cases should use  
the Basic Latin range when a given error can be elicited with Basic  
Latin only.

So far, the best idea I have seen is that each non-conforming test  
case should only contain one error of interest and this error should  
be the first error in source order. The source location of the first  
error should given in a range inside which all reasonable error  
location policies would place the error.

Thus, a test harness would sort the errors given by the checker by  
source location and compare the first one to the required location  
range given in the test metadata.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Wednesday, 31 October 2007 19:38:01 UTC