Re: dependencies in tests from Florian Rivoal on 2015-06-21 (public-css-testsuite@w3.org from June 2015)

From: Florian Rivoal <florian@rivoal.net>
Date: Sun, 21 Jun 2015 14:20:53 +0200
To: Gérard Talbot <css21testsuite@gtalbot.org>
Cc: "public-css-testsuite@w3.org" <public-css-testsuite@w3.org>
Message-Id: <C8915431-1F2D-486D-92BD-81DC76807969@rivoal.net>
> On 20 Jun 2015, at 19:38, Gérard Talbot <css21testsuite@gtalbot.org> wrote:
> 
> Le 2015-06-20 08:29, Florian Rivoal a écrit :
>>> On 18 Jun 2015, at 22:03, Gérard Talbot <css21testsuite@gtalbot.org> wrote:
>>> A is a boolean condition of some sort; if A does not exist, then we can not check for B.
>>> Not having A makes the test result undefined, unknown or makes the test not applicable.
>> Yes, exactly. Which is my question, how do we, in prose and in machine
>> readable metadata, indicate that a test is not applicable under
>> certain conditions (in this case, the condition being when vertical
>> text is not supported).
>>> Eg. Prince version 10.2r1 is a conversion HTML-to-PDF-with-CSS web-aware application. So, tests with flags "animated" and "interact" should be avoided and do not apply to such UA and the related test results should be ignored.
>> Right. But there is no flag for "vertical", so how do I mark up a test
>> as irrelevant?
> 
> If you know that a particular UA does not support vertical writing-modes, then don't take the writing-modes test suite, otherwise ignore such test results.

I won't take the writing-modes test suite, but test cases linking (using "help") to  writing modes and specs I care about get pulled into the test suite of the specs I care about automatically. Manageable, but annoying.

> 
>> When submitting an implementation report, these failures
>> should be possible to explain away if you're only implementing one of
>> the two specs, but both making that claim and checking that it is
>> justified is extra work.
>> Creating extra work for yourself regarding a specification you are not
>> interested in is an incentive against writing the test. Even if it is
>> not a strong one, since we don't have enough of these tests, this is
>> bad.
>> Maybe a way out of this would be add a feature to
>> testharness/shepherd/annotate.js: when giving the lists of tests for a
>> particular spec, they should also present you with check boxes for
>> every other spec cross referenced by the tests, and if you don't claim
>> to implement these specs, then you can uncheck these and get a test
>> suite with the irrelevant tests removed.
> 
> Is this worth it... I mean implementing this would mean significant work... and we don't have many tests that are true "cross spec test"...

I don't know if it is worth the effort. We already have the data, so hopefully it wouldn't be too hard to pull off, but then again, I'm not the one doing it, so I cannot judge.

As for not having many cross spec tests... chicken and egg problem? Maybe not and nobody would write them anyway. But personally I'd be more inclined to write them if they didn't cause this kind of (manageable, but still) annoyance.

>> Until we get this or something similar, I guess the right thing to do
>> is to make sure to reference both specs from the test so that the
>> feature described above can work when we introduce it, and maybe put a
>> note in prose in the test to inform human reviewers / testers.
> 
> Well, isn't it sufficient having 2 distinct <link rel="help"> pointing to 2 different specs?

It is enough, upon manual inspection of the source, to determine that the reason why this test failed is acceptable. It is probably not enough to let anyone running the test that this test isn't relevant, and that they should skip it.

> I am enclined to think that you may be going far here. I do not see a big problem with failing a true "cross spec test". Furthermore, if such test has a "may" flag, is there really a problem?
> I think there are much bigger problems with tests and test suites right now.

As far as prioritization is concerned, I'm sure there are indeed more pressing problems. I'd rate notifying spec owners (etc) of new tests in the repo needing to be reviewed as (much) higher priority than this.

But this is not the first time I run into this, and I've heard of others (Håkon, for instance), being confused by the inclusion of (failing) tests in a test suite where they didn't belong. Looking at an all-green result page sends a quite different message than one with red lines, even if the red lines can be argued away.

So maybe it is not high priority, but it still seems useful to me, both for internal usage for for PR reasons.

 - Florian
Received on Sunday, 21 June 2015 12:21:23 UTC