Re: compliance "scorecard"? from Eric A. Meyer on 2004-03-06 (www-style@w3.org from March 2004)

From: Eric A. Meyer <eric@meyerweb.com>
Date: Sat, 6 Mar 2004 10:24:22 -0500
To: www-style@w3.org
Message-Id: <a05210600bc6f8a0cf036@[192.168.1.31]>
At 18:06 -0600 3/5/04, Felipe Gasper wrote:

>Just an idea, would the W3C maybe be interested in hosting a CSS 
>compliance "scorecard" for the various browsers, which the browser 
>makers would maintain?

    Over the years, the W3C has steadfastly stayed out of the 
enforcement business.  They do provide test suites, which is great, 
but a suite does not take a stand on who's naughty or nice.  It just 
gives implementers a target at which to aim.

>Or I'm just thinking there should be some resource, since the specs 
>only give you a "Pleasantville" version of how to code your 
>presentation, that details things that are supported and aren't, 
>possibly with workarounds etc.

    I know a few things about compliance testing and publication of 
support information, so I'll throw in my perspective.  Up front, 
though, let me say that the essays Boris pointed to are excellent, 
and well worth reading.  Let me also say that I have a great deal of 
sympathy for your request, even though I'm about to explain why it's 
so difficult.
    Back when I first created a series of CSS tests (which eventually 
formed the bulk of the W3C's CSS1 Test Suite [1]) for myself and then 
decided to publish the results I found for Mac browsers of the day 
(inspired by work Braden McDaniel had already done for Windows 
browsers), it was easy to rate CSS compliance.  Generally, either 
something was supported, or wasn't.  If a property's or value's 
support had bugs, they were always so insanely glaring and pervasive 
that simple classifications were appropriate.  It made sense for a 
support chart to have four basic ratings: Y, N, P, and B for Yes, No, 
Partial, and Buggy.  And "Partial" was usually only used for 
properties that had some values supported, but not others.
    The support charts I created eventually became widely used and 
referenced because they provided a great deal of information in a way 
that was easily understandable.  You could glance at the chart and 
grasp what it was telling you.  (This is likely why subsequent 
support charts have used similar classification, and even styling.) 
It was easy to create a "Leader Board" by assigning weights to the 
various ratings and coming up with a percentage score-- rather like a 
scorecard, albeit one that conformed to my way of weighting the 
support ratings I'd assigned.
    But we've come a long way since then.  Time was, you could say 
that Browser X either supported padding, or didn't, because support 
was generally about that binary.  Today, Browser X might support 
padding perfectly, except not apply it to 'select' elements.  The 
first question to ask is whether that's even a bug, and some would 
say it isn't, form elements being notoriously tough to describe in 
terms of the existing CSS box and inline models.  If it actually is a 
bug, the challenge is to represent the information "supports padding 
except on one element" in a way that won't confuse, bore, or turn 
away users.  Doing that in a simple chart is rather difficult.
    So if you wanted to really chart compliance, you'd have to START 
with a test suite that tested every property-and-value combination on 
every element in HTML.  For 'font-variant' alone, which has only two 
possible values in CSS2.1, that would be 182 tests-- twice the number 
of HTML elements (91 in HTML 4.01 [2]).   For a property like 
'background', which is shorthand for several other properties, each 
of which can have up to a dozen or two possible values... well, you 
see the problem.  I'm not a mathematician, but someone with mathy 
tendencies once estimated that a test suite testing all such 
combinations would likely approach billions (or maybe it was 
trillions) of tests.  Have you seen the selectors test suite[3]? 
That alone has 176 tests, and it doesn't even have to test 
property-value combinations-- just whether or not a selector type is 
supported.
    It gets worse.  Suppose Browser Y has a bug where a floated 
element has its margins doubled in some circumstances, but not 
others.  Maybe it also has a bug where a float causes subsequent 
elements' content to drift leftward.  (Both of which IE/Win suffers.) 
Number one, how do you decide exactly what factors trigger these 
bugs, and number two, how will you represent THAT in a useful way? 
If you have to add another axis to the suite, where every potential 
combination of HTML elements and property-value combinations needs to 
be tested, I suspect you're going to arrive at enough tests to rival 
the number of electrons in the Solar System.
    (And don't even get me started on all the various version numbers 
of browsers, some of which might only go up a tenth or hundredth but 
still make noticeable changes in their CSS support.)
    You might reasonably expect to reduce the problem set by saying, 
"Eh, HTML, who cares?  That's the past.  We'll just create a small 
XML testing language with just enough elements, and test all those 
combinations."  That's still going to be an enormous number of 
combinations, and it also assumes that you make it complicated 
enough.  If a bug doesn't get tripped until eight block boxes are 
nested inside each other with a list item at the end of the fourth 
and an option list at the beginning of the first, you're either going 
to miss it because no such test exists, or it will get detected as 
soon as someone works through the forty-seven trillion tests that 
precede it.
    Of course, we could dismiss all this as over-engineering and go 
for a different solution, one that calls on human judgment to 
determine the relative support of browsers.  I'd be all for it, 
except human judgment is notoriously inconsistent.  The Worst Bug Of 
All Time tends to be whichever one just prevented me from doing what 
I wanted to do, no matter how obscure the circumstances that trigger 
it.  The same holds true for my perception of The Worst Browser Ever. 
I suspect I'm not alone in that kind of thinking.
    We could open it up for voting, thus deriving a community opinion 
that irons out individual vagaries.  The problem is that when the 
community votes, they vote their collective experience.  If someone 
declares 'font-color: red' and it only works in IE5/Win, they're 
probably going to go vote IE/Win good and all the other browsers bad. 
When an author makes a box 462px wide because he said 'width: 400px' 
and then added some padding and borders, he's going to slam browsers 
that do the right thing, because it isn't what he expected.  Such an 
open voting system will end up, to a large degree, drowning the true 
picture in a sea of collective misunderstandings.
    We could always restrict the voting to a trusted group of experts, 
but who decides who's trusted and expert enough to be a part of that 
group?  And when their collective voting ranks Browser Z dead last, 
how long before there are charges of an "anti-Z agenda"-- especially 
if Browser Z doesn't have any employees in the group?  If you ban all 
browser-maker employees from joining the trusted group, your pool of 
potential group members drops precipitously, because the experts tend 
to get hired by companies who need their expertise.  It only gets 
worse if you ban former employees as well.
    And these have just been the theoretical problems I've pondered 
over the years.  I haven't even gotten into questions like, "If the 
test suite uses 'object' and scripting to run the tests, and a 
browser doesn't support either 'object' or the script, then how can 
you test it at all?"  Or the classic: "Is this test actually correct 
in what it asserts?  I had a few that were wrong but were eventually 
corrected, and there was at one time a widely-cited test suite that 
had a lot of incorrect assertions that were never fixed.  There's a 
whole host of other such not-purely-CSS-but-still-relevant potential 
obstacles.
    I'll admit that this all could very well serve as a hideously 
extended mea culpa over the fact that I stopped updating my CSS 
support charts in 2001, but at least now you know why I did.  The 
problem set simply became far too intractable for me to handle.
    I will also freely admit that there may be an approach to testing 
and compliance rating that I have simply never considered, one that 
avoids the various problems I've raised.  If so, the old support 
charts [4] are under a Creative Commons license that allow them to be 
used as the nucleus for a new scorecard, if that would help.  If not, 
of course, feel free to ignore them.
    I've gone on more than long enough, but hopefully I've said a few 
things that help provide insight into the situation.


[1] <http://www.w3.org/Style/CSS/Test/CSS1/current/>
[2] <http://www.w3.org/TR/html401/index/elements.html>
[3] <http://www.w3.org/Style/CSS/Test/CSS3/Selectors/current/html/>
      (one version of it, anyway)
[4] <http://devedge.netscape.com/library/xref/2003/css-support/>

-- 
Eric A. Meyer  (eric@meyerweb.com)    http://www.meyerweb.com/eric/
Principal, Complex Spiral Consulting  http://www.complexspiral.com/
"CSS: The Definitive Guide," "CSS2.0 Programmer's Reference,"
"Eric Meyer on CSS," and more   http://www.meyerweb.com/eric/books/
Received on Saturday, 6 March 2004 10:24:35 UTC