Re: [HTMLWG] CfC: Adopt "Plan 2014" and make some specific related decisions from L. David Baron on 2012-10-23 (public-html@w3.org from October 2012)

From: L. David Baron <dbaron@dbaron.org>
Date: Tue, 23 Oct 2012 19:11:54 +0200
To: James Graham <jgraham@opera.com>
Cc: public-html@w3.org
Message-ID: <20121023171154.GA24662@crum.dbaron.org>
On Monday 2012-10-22 17:11 +0200, James Graham wrote:
> I have been vaguely pondering the notion of assigning each test a
> priority, so that an implementation that passed all the P1 tests
> would have "basic support" for a feature, and one that passed all
> the P1-P5 tests would have "excellent support" for a feature, or
> something. That might provide a reasonable balance between
> conformance tests as a promotional tool — something which it is
> clear that the market desires, regardless of what we may think — and
> conformance tests as a way of actually improving interoperability.
> 
> I have several concerns with this idea. It might be a lot of work,
> and one certainly couldn't expect test submitters to do it. It might
> lead to test classification fights (but surely this would be better
> than people fighting to drop tests altogether?). A single test might
> fail for a P1 reason ("there is a huge security hole") or a P3
> reason ("the wrong exception type is thrown"). I don't know if these
> are insurmountable issues or if there is some other tack we could
> take across this particular minefield.

So conformance tests might fail for a bunch of reasons:
 * crashes
 * hangs
 * security vulnerabilities shown by test failure
 * other incorrect behavior shown by test failure
In practice, I think the vast majority of failures observed fall
into the "other incorrect behavior" category.

The actual harm caused by said "other incorrect behavior" seems to
fall into a bunch of categories (where in all cases, "content"
that's relevant to document formats could be "client/server
implementors" relevant to network protocols, etc.):

1. an implementation with this failure can't correctly handle some
   existing content tested only on implementations without this
   failure

2. content tested only on implementations with this failure fails to
   work on implementations that do not have this test failure

3. content tested only on implementations with this failure
   constrains future feature development on the Web

4. developing content that works both in implementations with and
   without this failure requires extra work

Now, there are some cases (e.g., some cases with CSS rendering
issues) where we can limit the scope of "not work" to things not as
severe as complete inability to use the page.  But I think those are
the minority rather than the majority; just about anything testable
by script can completely prevent a page from working, since a script
could depend on the correct behavior.

There are also probably ways to quantify the amount of extra work
needed for item (4), but I'm not sure how well we can do it.


I think the real importance of fixing correctness bugs depends on
how much of the Web (weighted by frequency of use) uses and depends
on the behavior or on how much we want the feature to be used.

However, I think we ought to decide how much we want the feature to
be used before we spec and implement it, rather than specifying and
implementing unimportant features and then not writing tests for
them.

And I think the importance in terms of frequency-of-use (and how
much of the Web wouldn't work in an implementation with that bug) is
very hard to maintain; we'd need to bump any test to P1 the moment
facebook, gmail, etc. started depending on the behavior it tests,
which is very hard to notice when implementations don't actually
fail the test.

So I tend to think that trying to prioritize the tests is a lot of
work and mostly out-of-scope of a standards conformance test suite.
I think the basic result of a standards conformance test suite ought
to be either "has known bugs" or "does not have known bugs"; it then
ought to be possible to list and describe these bugs, and from the
list of bugs (not the list or number of failures) describe their
severity (with the caveat that some failures might leave other
potential bugs untested).

-David

-- 
𝄞   L. David Baron                         http://dbaron.org/   𝄂
𝄢   Mozilla                           http://www.mozilla.org/   𝄂
Received on Tuesday, 23 October 2012 17:13:15 UTC