Re: Should we test SHOULD? from James Graham on 2013-09-23 (public-test-infra@w3.org from July to September 2013)

From: James Graham <james@hoppipolla.co.uk>
Date: Mon, 23 Sep 2013 10:48:59 +0100
To: public-test-infra@w3.org
Message-ID: <52400E8B.1060308@hoppipolla.co.uk>
On 22/09/13 10:26, Tobie Langel wrote:
> Hi all,
>
> A pull request[1] on the main repo brought up the issue of how to
> handle optional normative requirements (SHOULD, MAY, etc.). It's not
> the first time this issue was discussed. I found, for example, a
> rather long thread on this topic[2] on the public-webapps mailing
> list. I couldn't find, however a recommended practice on how we
> should handle this. I'd like us to agree on one and document it.
>
> Here's a number of propositions:
>
> 1) We only test MUST normative requirements.
>
> 2) We test all normative requirements, and rely on result
> interpretation to determine whether an implementation conforms to the
> spec (an implementation can fully conform even though it fails a
> number tests, as long as those are determined to be SHOULD/MAY
> tests).
>
> 3) We test all normative requirements but add meta data to those
> tests that aren't MUST requirements. This allows running subset of
> tests when SHOULD requirements don't make sense. E.g. avoid running
> media capture tests on a device that doesn't have a camera.
>
> I'd be inclined to go with 3), but I'm eager to hear other's thoughts
> on the subject.

So, first of all, I hope we all agree that specs which use "should" for 
interop-affecting things are broken. So we can take it as read that any 
"should" conditions are on the UI or other things that can legitimately 
differ between implementations. In this case it appears that the 
"should" has been used encourage implementtors to provide a certain UI 
feature on a certain subset of UAs (those with more than one camera), 
which is fine (although often misguided; c.f. HTTP's attempts to require 
UI).

We also need to realise that due to the requirement of 
browser-neutrality, the web-platform-tests can't cover everything. For 
example they can't cover cases that only happen if there is a GC before 
a certain function call because there is no browser-neutral way to force 
a GC. They also can't hope to cover UI details because each browser will 
have slightly different UI which will have a detailed flow that needs to 
be tested. Merely testing "does this UI exist" isn't very useful unless 
your goal is to hand out conformance badges, which isn't something that 
the W3C has done, and is something that I would be very strongly opposed 
to as it sets incredibly perverse incentives for testing. Given the fact 
that we don't and shouldn't do formal conformance testing, I don't 
follow option 2) above and think we can safely ignore it.

Given the above, I think accepting "should" tests has low, but possibly 
non-zero, value. The remaining question is "what's the cost?". Well we 
can of course allow people to write these tests and add some metadata 
indicating that they are "should" level requirements. This on its own 
doesn't seem very useful; for any particular implementation it's hard to 
know if it should conform to the "should" or not. So the easiest option 
would be to run none of them. This takes us back down to zero value (and 
wasted time writing the tests). Alternatively one could run all of them 
but not expect them to pass. But this seems onerous given that these 
tests will largely be manual tests and it is difficult to ensure that 
everyone active in running the tests can work out which tests are 
applicable in which situations.

So in order to actually make use of should-level tests, someone — either 
the test author or the person responsible for actually running them in 
each implementation — would have to make a much more detailed ontology 
describing the reasons for each "should" being applicable or not. This 
could then be used on a per-implementation, per-platform, per-device 
basis to work out if the test ought to pass or not. To me that sounds 
like a lot of work for what I consider to be low value tests. Therefore 
I conclude that we should take option 1); simply consider "should" level 
conditions in the spec as untestable (for us) as other 
implementation-specific requirements on UI. If you really want to call 
out these somehow, I would be happy for people to write should.txt files 
describing all the should level conditions to guide people writing 
implementation-specific tests.
Received on Monday, 23 September 2013 09:49:38 UTC