Re: Calling All Tests

On Mon, Jun 8, 2015 at 11:42 AM, Joe Berkovitz <joe@noteflight.com> wrote:

> Thanks for all the great responses everyone. I'll try to respond to
> Raymond's points as best I can just to advance the discussion, since Paul
> seconded his questions. This is just a starting point...
>
> However, I do have some questions about this.
>>
>>    - To be truly useful for Chrome, we would need to be able to import
>>    the testsuite directly and use it without having to modify anything.
>>    (Perhaps there is some setup code, but not the tests themselves.) This
>>    means, at least, that it can be run as is within Chrome's testing
>>    infrastructure so that we get continuous integration testing.
>>
>>
> If we upstreamed everything from the Chrome and Moz suites into the WPT
> repo (which already contains a few Web Audio tests), would that suffice?
>

That's a start. I would assume we'd remove any duplicates, merge tests that
do almost the same thing, and so on.

However, I was more concerned about the reverse.  We have the WPT repo, so
how do I get that into Chrome's testing infrastructure without modifying
any test in the WPT repo? Without this, blink won't find any utility in the
repo and will probably just continue to use it's existing test suite.
Sadly.


>
>>    - What exactly is the scope of this testsuite? Is it meant to be a
>>    kind of compliance test where if you pass, you are compliant?  And what if
>>    you don't pass? Does that mean you can't say you support WebAudio?
>>
>> My belief is that this should be a test suite that identifies
> noncompliance in some set of areas. So if you fail the tests, your
> implementation is definitely not compliant.
>
> However I don't think we need to sign up for the converse being true (that
> if you pass the tests, your implementation is compliant). That would
> require the suite to achieve total coverage in the testing of every aspect
> of the API including many obscure issues we have not thought of or defined
> yet. So I believe the test suite will continue to be an evolving work in
> progress, and thus can't absolutely assure compliance.
>

I assume that despite the failures you can still call yourself an
implementation of WebAudio that purports to conform.

>
>
>>    - How accurate must the tests be?  There can be a lot of floating
>>    point going on, so differences between platforms and browsers are
>>    expected.  How is it decided that a result is accurate enough? We have this
>>    issue on Chrome today between Linux, OSX, Windows, and Android. This gets
>>    greatly multiplied once other browsers are added.
>>
>> We should have some definable threshold for errors that we agree on.
> Obviously exact floating point comparisons are not going to work. I think
> proposals are needed.
>
> To aid in this, any fuzzy comparison techniques should be defined in some
> central place that applies across the test suite, so that the approach can
> be adjusted as the group moves forward.
>
>
>>
>>    - How will the tests be managed? Say a new test for Firefox is added
>>    but fails on Chrome. Now what? How is this resolved? Fisticuffs?
>>
>> How is this done today for specs that already possess extensive and
> reasonably agreed test suites?
>
> I would think if there are conflicts in test results which vendors don't
> agree on, then they ultimately reflect problems with the spec that the
> group needs to address. Today we seem to manage this kind of progress
> mostly without fisticuffs (although it is tempting, I know...)
>

That was a test to see if anyone was reading. :-)

I'm not aware of anything today, but I can imagine some test where vendors
will agree to disagree, and specing the difference is either too difficult
or implementing the solution is too onerous for some vendor. Perhaps we can
cross that bridge when we get to it.


>
> So the Chrome team could a) agree the test is valid and file a bug (and
> eventually fix Chrome), b) claim that it's invalid and take a spec dispute
> to the WG, which could go either way.
>
> By the way, my hope would be that new tests are not added specifically
> "for FF" or "for Chrome" but are added to a central repo from which all
> vendors would periodically pull.
>

My assumption here was that, say, FF found a bug in their implementation
and want a test for that and pushes that to the repo. Which then causes
Chrome to fail that test (for whatever reason).  We wouldn't want to block
FF from having a regression test, but we wouldn't want to be blocked from
updating because the test will cause failures in our test bots.


>
>
>>    - Who is responsible verifying the tests? And testing with the
>>    various browsers? Will this be a "reference" where each browser vendor can
>>    pull from and use?  And then any issues get raised to be resolved by the
>>    group?
>>
>>
>>
> I am a little unsure on the best way to proceed for QA on the tests
> themselves. I think once we pull them all together and can run them on all
> the major platforms, we'll have a better idea how much stuff needs to be
> reconciled and how complicated the situation on the ground actually is.
>
> Ultimately, though, I believe we should wind up with a single reference
> suite of tests that run successfully on all compliant browsers, and any new
> tests will need to run through some sort of approval process to get into
> that suite. Probably we need to have some conventions with respect to
> release and dev branches of the test suite itself.
>

I won't have time to do this myself in the near future, but I would like to
pull a few tests over from the WPT repo and run them to see how they would
work in Chrome.   Perhaps everything will work just fine.

--
Ray


> ...Joe
>
>

Received on Monday, 8 June 2015 20:15:37 UTC