RE: Should we test SHOULD? from Arron Eicholz on 2013-09-24 (public-test-infra@w3.org from July to September 2013)

From: Arron Eicholz <Arron.Eicholz@microsoft.com>
Date: Tue, 24 Sep 2013 17:19:37 +0000
To: Tobie Langel <tobie@w3.org>, Robin Berjon <robin@w3.org>, Rebecca Hauck <rhauck@adobe.com>, "Linss, Peter" <peter.linss@hp.com>, "fantasai.lists@inkedblade.net" <fantasai.lists@inkedblade.net>
CC: James Graham <james@hoppipolla.co.uk>, "public-test-infra@w3.org" <public-test-infra@w3.org>
Message-ID: <fb27546120824f8194197f1b6174c350@BLUPR03MB602.namprd03.prod.outlook.com>
On Tuesday, September 24, 2013 3:03 AM Tobie Langel wrote:
> To: Robin Berjon; Rebecca Hauck; Linss, Peter; fantasai.lists@inkedblade.net
> Cc: James Graham; public-test-infra@w3.org
> Subject: Re: Should we test SHOULD?
> 
> On Monday, September 23, 2013 at 12:31 PM, Robin Berjon wrote:
> > On 23/09/2013 11:48 , James Graham wrote:
> > > Therefore I conclude that we should take option 1); simply consider
> > > "should" level conditions in the spec as untestable (for us) as
> > > other implementation-specific requirements on UI. If you really want
> > > to call out these somehow, I would be happy for people to write
> > > should.txt files describing all the should level conditions to guide
> > > people writing implementation-specific tests.
> >
> > I would opt for a "modified (1)". By default, don't test SHOULD (they
> > normally aren't testable) but allow people to use their better
> > judgement and include tests for SHOULD in cases where they feel the
> > specification was exceedingly cautious.
> >
> > The issue of devices not being able to support some functionality is
> > orthogonal, and should really not be handled by use of SHOULD in
> > specifications. For instance, an eInk device won't render colour, but
> > it would not IMHO be a good use of anyone's time to put all mention of
> > colour behind SHOULD in any CSS module that relates to it.
> 

For CSS we write the test expecting you have specific configuration on your machine [1] (we call these out in out configurations notes). We expect black text on white background, no minimum font size, etc.... Defining these then allows tests to have a baseline expectation for a machine configuration. If you deviate from that in a test then you need to call it out in prerequisites on the test or flag the test in some way. Once a test is categorized and grouped it is then easy to decide how to run your tests.
 
For instance in the case of an eInk device, you might want to create a grouping of only cases that do not test color on monochrome devices. However, I have a feeling your typical test requirements will be to have a color screen by default. So for eInk, monochrome devices you may want to have a different way of grouping. I could see a tester select eInk/monochrome and run suite and only the relevant tests will be reported as true pass/fail. All non-relevant cases can either not be run or do not report in the same way.

Typically for CSS we create another flag [2] that can identify tests as needing possible requirements. These requirements are typically external specs and systems that are not specifically being tested but are necessary in order to test the scenario from the spec. An image for instance often doesn't really have any direct connection to testing a CSS property. However, they are often necessary in order to verify the property is working correctly. If your UA doesn't support or has images disabled, the image cases will fail but they aren't a failure of the CSS property being tested, it is because the UA doesn't support image. The test then should not be a "failure" but a "not testable" result.

A perfect example of this is if you are testing CSS and the property 'background-position'. Let's assume you specified the image to point to an SVG image. If the UA doesn't support SVG does the test fail? No, because the test was testing a 'background-position' not SVG support. This is the reason for the SVG flag in CSS, because SVG is an external requirement for some tests but does not impact the pass/fail of the property results because a UA does not support SVG.

> It would be interesting to hear the CSS WG's perspective on the matter.
> Crude stats over the csswg-test repository report the following:
> 
>     $ grep "flags" -r ./ | grep "should" | wc -l
>     28
>     $ grep "flags" -r ./ | grep "may" | wc -l
>     214
> 
> 
> 
> i.e. there's roughly 30 tests that are marked as testing SHOULD and over 200
> that are marked as testing MAY.
> 
> Peter, Rebecca, fantasai, others, any thoughts on how to best handle
> SHOULD/MAY requirements in specs? For ref, the thread starts here:

From what I have read in this thread there is still one missing component. What are these tests for? Are they for testing UAs and interoperability, UA spec compliance or whether the spec can be implemented? Without a clear goal or definition of what you are trying to accomplish, trying to decide what to do with SHOULD/MAY cases is pointless.

If you are only testing for UA interoperability then a majority of the time SHOULD/MAY cases do not need to be included in the testing (there are a few rare exceptions).

If you are verifying UA spec compliance or if specs can be implemented and provide reasonable coverage for moving a spec forward to REC then you will need to cover the SHOULD/MAY cases. In this case however it is often tricky to define simple pass conditions since there may be many possible outcomes. Often with CSS we were very vague with our pass conditions in these SHOUD/MAY cases when there were multiple solutions. Personally in writing a large majority of those cases I know that the "pass" conditions in those cases are the trickiest to write so they will pass and fail when appropriate.


> 
> http://lists.w3.org/Archives/Public/public-test-infra/2013JulSep/0240.html

> 
> Thanks,
> 
> --tobie


I think that solution 2 or 3 would be the better choice for the spec and moving the spec down the REC track. It requires a little bit more work on the back end system to report tests correctly and on test writers to write good SHOULD/MAY tests but I think it provides the best solutions for the spec and UAs. The specs benefits by having coverage, the UAs benefit by having tests to verify against and the community can see what interesting things need to be defined better to move things out of the SHOULD/MAY category in the future specs.

--
Thanks,
Arron Eicholz

[1] - http://test.csswg.org/harness/

[2] - http://wiki.csswg.org/test/format#requirement-flags
Received on Tuesday, 24 September 2013 17:20:29 UTC