Re: Test suites and RFC2119

* Rich Tibbett wrote:
>We currently define tests in test suites for SHOULD requirements. A 
>problem occurs when those tests are used to gauge the overall compliance 
>of an implementation to the full test suite. An implementation could 
>theoretically be 100% compliant without needing to pass non-MUST and 
>non-MUST NOT tests.

It is not reasonable to think you can express how well an implementation
may interoperate or how well it avoids harm with a single number. Weight
for instance is another issue. If you implement XMLHttpRequest perfectly
except you decode text using Windows-1252 instead of UTF-8, should you
score 99%, 80%, 1%? Clearly your implementation is rather broken and you
can't use it in many circumstances the way you could use others, there's
no need to express this with some number. It would not be reasonable for
people to discuss how to change the test suite or the scoring system so
the "total" properly expresses how broken the implementation is.

There is no difference in that regard between MUST and SHOULD and MAY-
level requirements. If  the XMLHttpRequest specification implementations
should cease communicating with a server, and an implementation does not
do that, we can't say whether that's a 10% or 0.001% problem, unless we
actually look very carefully at what's actually going on and put that in
context. An XMLHttpRequest implementation may support iso-8859-2; if you
do everything perfectly and also implement iso-8859-2, would you score
101% because you handle more content, or 99% because it's not as lean as
it could be?

>Perhaps we should introduce 'bonus' points for SHOULD/SHOULD NOT/MAY and 
>RECOMMENDED tests and not have them contribute to overall compliance 
>output, thereby allowing implementations to claim 100% compliance to 
>MUST/MUST NOT tests. An implementation can then optionally collect any 
>available, optional bonus points as available from requirements marked 
>up with other keywords.

http://lists.w3.org/Archives/Public/public-xml-testsuite/2006Sep/0000 is
a simple test for XML conformance that Opera fails. As far as I am aware
it is not part of the official one. So what would Opera score in terms
of XML conformance? Can't be 100% because it has bugs. There is little
use in trying to optimize these numbers, normal people do not know some
browser or other scores 91 or 97 or 99 points with "Acid3". They might
be vaguely aware if you score 100 out of 100 points, but that's it.

Some people take this as far as saying should-level requirements should
not be tested in "official" test suites. That's entirely incompatible
with the idea that we make specifications and test suites so people do
not run into certain problems. We mean to affect many implementations,
so knowing how a single one fares is not very useful without the context
of the other implementations.

We should make good tests and publish clear and honest results. If we'd
get to a point where, with the example above, test suites are maintained
so that good tests like mine are added, that would already be a major
improvement. We should not inhibit this by worrying about scores. If you
do not implement a should-level behavior, that should be clear from test
suite results. If you do not implement an optional feature, that should
be clear from test suite results.

>Wondering if there is any set W3C thinking on this or a way of including 
>SHOULD tests in test suites but clearly indicating that they are, 
>basically, optional and do not count towards the overall compliance 
>score? I couldn't find anything in [1].

It would be misleading to claim a should-level requirement is "basically
optional" simply because analysis and human judgement is required to un-
derstand whether failing to implement the requirement is a problem.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Received on Monday, 4 July 2011 12:33:35 UTC