Re: Ringmark, Core Mobile, and the Financial Times from Robin Berjon on 2012-06-12 (public-coremob@w3.org from June 2012)

From: Robin Berjon <robin@berjon.com>
Date: Tue, 12 Jun 2012 12:52:09 +0200
To: James Graham <jgraham@opera.com>
Cc: public-coremob@w3.org
Message-Id: <6D5F2392-90C3-4C5B-A17C-C9F2505216DB@berjon.com>
Hi James,

On Jun 12, 2012, at 10:47 , James Graham wrote:
> On 06/11/2012 12:38 PM, Robin Berjon wrote:
>> I think that we need to spend more time working on Quality of
>> Implementation testing. It's not easy, but it's very valuable. I
>> don't know of a single W3C group that has that on its radar — and
>> it's definitely something that CoreMob could do. Allow me to start
>> hashing some details out below, and let's see if they stick
> 
> Since we do so badly with the low hanging fruit (testable assertions), I don't think that we are at the point where expanding into maintaining public lists of hard-to-test QoI issues should be a priority.

This vision may be true but perhaps only up to a point. We have 250 participants here, several of whom have indicated having resources to dedicate to testing. Yet I'm not seeing much in the way of contributions beyond the initial Ringmark which people are complaining about for being far too much about feature testing.

So we have resources on one side and work that needs to be done on testing on the other, but something is not happening — I'd like to know why. It would certainly help if some in the 200+ lurkers were to speak up!

Perhaps it is that there's something missing in making it possible, easier, simpler, clearer, better incentivised to apply the former to the latter. In that case I'd like to know how we can help.

But perhaps it is that these resources have interests in testing that are somewhat different from strict conformance testing. In which case, they ought to be allowed to work on the stuff that they see as most valuable. Put differently, I'd rather have no increase in conformance tests (from this group) but some QoI/performance tests than no increase in conformance tests and nothing.

> The exception to this is performance benchmarks; it is always useful to get good benchmarks. But it turns out that making *good* benchmarks is very challenging, even for people that you might expect to be competent (see [1] for a random example that I happen to recall. It is by no means the only example).

You forgot to include the link :)

>> 1. CSS transitions at reasonable speed. 2. Canvas running at
>> reasonable speed. 3. CSS font family fallback used during font
>> loading. 4. Reasonable audio latency and mixing. 5. AppCache working
>> in reasonable ways. 6. Decent UIs for new HTML form control types
>> (unlike the<input type=date>  "support" in Safari). 7. Have the UI
>> behave "helpfully" on quota increases.
> 
> Apart from the problems that others have mentioned e.g. hardware dependence, these also have the problem of being largely subjective. What is "reasonable" audio latency and mixing?

Yes, these are unabashedly subjective. I don't think that's a core problem — it's just a property of the system which we can work with.

Reasonable audio latency is when you can play a game that goes "BOOM poorsch BANG BANG" often and it actually feels like it's related to your actions rather than random. There's enough mixing if the music keeps playing, two zombies going "JAAAAMESS BRAAAAINS", and the sound of the player clubbing them in the head with a hardcopy of the HTML spec at the same time without noticeable problems.

> What is "decent" UI?

That's a tougher one — I'm not saying that the list above is anywhere near good, it's just what came to mind based on feedback from developers.

> Form controls are a slightly different matter as some vendors have been known to rush out broken implementations just to appear like they have support for a feature on feature-testing sites. But to the extent that one can have a minimal set of criteria to implement a feature they could be actual (manual) tests.

That would WFM, but I would definitely flag it as a priority area (especially for mobile).

>> I've been thinking about how best to get input from developers, and
>> frankly if you have to get them to find the right spec against which
>> to test and then the right group to which to submit it's just too
>> much work for all but the most dedicated. Having us do the triage,
>> perhaps partly automated, might make more sense (especially if we can
>> then get help on the triage bit as a unit of work made simple
>> enough).
> 
> I am slightly worried that people who aren't reading the spec are unlikely to be writing correct testcases. Working out what the expected behaviour is in complex cases is non-trivial (and not always just "whatever $popular_engine does" — especially since the cases of interest will presumably show browser differences).

Yes, but this is a chicken-and-egg problem. How do you bootstrap enough people into knowing how to read specs well-enough and then submit tests? Getting a critical mass to read specs well is proving (expectedly) hard, I wonder if starting with test cases wouldn't be simpler.

> Also there hasn't been that much success in getting (browser) people to do upfront review of public testcases; no one's job description includes that task so it doesn't happen much.

Which is why I was considering getting help on the triage as well.

> On the other hand having too many testcases to handle would be a new and interesting problem to have so I am not opposed to reducing the accidental complexity of making and submitting tests.

I'll think about it some more — this could be a good job for test-infra.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Tuesday, 12 June 2012 10:55:39 UTC