- From: L. David Baron <dbaron@dbaron.org>
- Date: Tue, 23 Oct 2012 19:11:54 +0200
- To: James Graham <jgraham@opera.com>
- Cc: public-html@w3.org
On Monday 2012-10-22 17:11 +0200, James Graham wrote: > I have been vaguely pondering the notion of assigning each test a > priority, so that an implementation that passed all the P1 tests > would have "basic support" for a feature, and one that passed all > the P1-P5 tests would have "excellent support" for a feature, or > something. That might provide a reasonable balance between > conformance tests as a promotional tool — something which it is > clear that the market desires, regardless of what we may think — and > conformance tests as a way of actually improving interoperability. > > I have several concerns with this idea. It might be a lot of work, > and one certainly couldn't expect test submitters to do it. It might > lead to test classification fights (but surely this would be better > than people fighting to drop tests altogether?). A single test might > fail for a P1 reason ("there is a huge security hole") or a P3 > reason ("the wrong exception type is thrown"). I don't know if these > are insurmountable issues or if there is some other tack we could > take across this particular minefield. So conformance tests might fail for a bunch of reasons: * crashes * hangs * security vulnerabilities shown by test failure * other incorrect behavior shown by test failure In practice, I think the vast majority of failures observed fall into the "other incorrect behavior" category. The actual harm caused by said "other incorrect behavior" seems to fall into a bunch of categories (where in all cases, "content" that's relevant to document formats could be "client/server implementors" relevant to network protocols, etc.): 1. an implementation with this failure can't correctly handle some existing content tested only on implementations without this failure 2. content tested only on implementations with this failure fails to work on implementations that do not have this test failure 3. content tested only on implementations with this failure constrains future feature development on the Web 4. developing content that works both in implementations with and without this failure requires extra work Now, there are some cases (e.g., some cases with CSS rendering issues) where we can limit the scope of "not work" to things not as severe as complete inability to use the page. But I think those are the minority rather than the majority; just about anything testable by script can completely prevent a page from working, since a script could depend on the correct behavior. There are also probably ways to quantify the amount of extra work needed for item (4), but I'm not sure how well we can do it. I think the real importance of fixing correctness bugs depends on how much of the Web (weighted by frequency of use) uses and depends on the behavior or on how much we want the feature to be used. However, I think we ought to decide how much we want the feature to be used before we spec and implement it, rather than specifying and implementing unimportant features and then not writing tests for them. And I think the importance in terms of frequency-of-use (and how much of the Web wouldn't work in an implementation with that bug) is very hard to maintain; we'd need to bump any test to P1 the moment facebook, gmail, etc. started depending on the behavior it tests, which is very hard to notice when implementations don't actually fail the test. So I tend to think that trying to prioritize the tests is a lot of work and mostly out-of-scope of a standards conformance test suite. I think the basic result of a standards conformance test suite ought to be either "has known bugs" or "does not have known bugs"; it then ought to be possible to list and describe these bugs, and from the list of bugs (not the list or number of failures) describe their severity (with the caveat that some failures might leave other potential bugs untested). -David -- 𝄞 L. David Baron http://dbaron.org/ 𝄂 𝄢 Mozilla http://www.mozilla.org/ 𝄂
Received on Tuesday, 23 October 2012 17:13:15 UTC