Re: Automated Test Runner from Aryeh Gregor on 2011-02-20 (public-html-testsuite@w3.org from February 2011)

From: Aryeh Gregor <Simetrical+w3c@gmail.com>
Date: Sat, 19 Feb 2011 19:42:06 -0500
To: "L. David Baron" <dbaron@dbaron.org>
Cc: James Graham <jgraham@opera.com>, Kris Krueger <krisk@microsoft.com>, Anne van Kesteren <annevk@opera.com>, "public-html-testsuite@w3.org" <public-html-testsuite@w3.org>, "Jonas Sicking (jonas@sicking.cc)" <jonas@sicking.cc>
Message-ID: <AANLkTimPNtVkWq4JCiBG2rRb-YGgbQsgY7Q0O5BxJz7K@mail.gmail.com>

On Fri, Feb 18, 2011 at 4:01 PM, L. David Baron <dbaron@dbaron.org> wrote:
> This is one of a number of reasons that I don't think it makes sense
> to publish pass percentages.  They're not a useful metric,
> especially in an environment where different vendors can contribute
> tests (especially large numbers of tests that don't actually provide
> much real coverage) in order to skew that metric.
>
> Instead, it would make more sense to publish "does implementation X
> pass all tests for feature Y".

That makes sense to me.  We want some report of how well different
implementations implement various parts of the standard, but "all
tests for feature Y" seems like it would be good enough for that
purposes, and it's much more meaningful than a percentage.  Of course,
this would only be for things where we have a decent approved test
suite -- within HTML, it seems that means only canvas right now.
(IMO, my base64 and reflection tests also qualify, but no one's
reviewed them yet.)

The only problem I'd see is that if we have very thorough test suites,
this might set the bar too high.  In other words, it might be such a
pain to get all the details right, relative to web-compat issues, that
browsers wouldn't bother for some test suites.  But we can deal with
that as it happens, on a case-by-case basis.  I think your proposal
would be a good one to adopt for now.

I get the impression that Opera wants to have a fixed number of tests
because their internal test runner expects that.  Maybe someone from
Opera can clarify this.  At this point I see no particular reason that
we'd need to always run the same number of tests.

Received on Sunday, 20 February 2011 00:43:00 UTC