Re: Towards a better testsuite

On Thu, Mar 24, 2016 at 11:30 AM, Geoffrey Sneddon <me@gsnedders.com> wrote:

> On Thu, Mar 24, 2016 at 6:01 PM, Dirk Pranke <dpranke@chromium.org> wrote:
> > Hi,
> >
> > Thanks for the update. Here's a few additional minor bits of info ...
>
> I was waiting for people to correct me. :)
>
> > Blink pulls updates of (a small subset of) web-platform-tests manually
> but
> > reasonably frequently,
> > probably more than once a week. We could automate this easily but so far
> > haven't.
> >
> > We would ideally like to run most if not all of w-p-t but probably won't
> > until we can figure
> > out how to best de-duplicate the surely large overlap between w-p-t and
> our
> > own tests
> > (or until we decide we can afford to run twice the number of tests we run
> > today).
> >
> > We also pull (a very small subset of) csswg-test but do so much less
> > frequently.
> >
> > We have no plans to import and run all of csswg-test, mostly because the
> N
> > thousand
> > manual would require us to have new pixel references for them. However,
> it
> > would
> > be good to importing more of the test suites, under the same conditions
> as
> > w-p-t above
> > (figuring out how to de-dup things, or accept the overlap).
>
> We're making decent progress on getting rid of the screenshot tests (I
> think we're down to ~4k now). Certainly the expectation is nobody will
> start running new screenshot tests at all regularly. (Except, maybe,
> Microsoft. And they really don't want to.)
>

A large percentage of Blink's existing screenshot tests actually come
from older versions of the CSS test suites, so I expect we could actually
just upgrade those versions to the latest in-repo tests without increasing
the overall number of screenshots too much, but that is work that
hasn't been done yet.

Insofar as performance goes: as I understood it, there was a belief
> that it was plausible to run them all on CI with extra hardware, and
> dealing with the performance that way? Still makes performance of
> running everything local terrible, admittedly!
>

For desktop machines, yes, that's certainly plausible. Even for mobile
we are theoretically capable of doing so. However, it's unclear if
the ROI of running the two sets of tests is great enough to be worth it.


> > We do not have a good process worked out for automatically upstreaming
> tests
> > and fixes
> > back to the w3c repos, but we do have people regularly contributing tests
> > upstream.
>
> So what Mozilla does (for both Gecko and Servo) with
> web-platform-tests is have two-way syncing between their copy and
> upstream, so from a committer's point-of-view all that's needed for it
> to get upstreamed is landed within their tree. This essentially relies
> on the policy that web-platform-tests only requires *a* public
> review—and hence there's willingness to trust those downstream. (Yes,
> in theory this could go horribly wrong; no, in practice, it never
> has.)
>

That is an interesting approach; it seems like it would be worth having our
folks follow up with you further on this.


>
> > I believe at least some subsets of w-p-t and csswg-test have been
> imported
> > into WebKit
> > in the past; the import process is (or at least was) roughly the same
> > between Blink and
> > WebKit.
> >
> > I thought the plan of record already was to merge csswg-test into w-p-t?
> I
> > strongly
> > support that.
>
> There were some strong objections before, mostly related to how
> Shepherd interacts with the (Mercurial) repo.
>

Yes, I remember those objections. I thought the decision had been
made regardless, but since no one apparently signed up to do the work
to migrate, it was perhaps an empty decision.

-- Dirk


>
> > The v8 team runs test262 as part of their CI, and my impression was that
> > they kept it
> > up to date, but I'd have to check further to confirm that; I do not know
> how
> > much they
> > have contributed back.
> >
> > Given that test262 and the ECMA committee follows a fundamentally
> different
> > process,
> > and that the JS tests can usually be run by a standalone JS engine
> without
> > requiring
> > a browser, I see little advantage to trying to merge test262 into w-p-t
> or
> > otherwise
> > mess with their processes; I think that's outside the scope of this
> group.
>
> Agreed; I listed that mostly for the sake of including it as one of
> the test suites that browser vendors run. It's issues are really
> tangential given it's mostly targeting being run in a different
> program.
>
> > (I have not been following this stuff at all closely for a year or more,
> so
> > maybe I've missed
> > things. In particular I don't know the current state of WebKit and I
> hope I
> > am not
> > contradicting anything that rniwa@ said in the meeting).
> >
> > -- Dirk
> >
> > On Thu, Mar 24, 2016 at 10:00 AM, Geoffrey Sneddon <me@gsnedders.com>
> wrote:
> >>
> >> Hi,
> >>
> >> A bunch of us got together at TPAC last year to talk about how we
> >> could make the CSS testsuite more useful to more people.
> >> <https://www.w3.org/2015/10/28-testing-minutes.html> has minutes of
> >> the breakout session we had on Plenary day, though plenty more talk
> >> happened at other times. There's been some discussion on
> >> public-css-testsuite too, for those of you who don't subscribe there
> >> (you totally should!). Having spoken to Alan, we thought it was best
> >> to take much of this to the whole group. I'll probably start
> >> reappearing on telecons soon, too, all with ulterior motives, of
> >> course.
> >>
> >> Fundamentally the situation with testing the web platform at the
> >> moment is that we essentially have three large repositories of tests:
> >> test262 (from ECMA TC-39, focused on their specs), csswg-test (from
> >> us), and web-platform-tests (covering everything else). From the point
> >> of view of all browser vendors, it'd be great if we were all sharing
> >> the vast majority of tests we write—and we're not anywhere near there
> >> yet.
> >>
> >> The current status, as I understand it, is: test262 I believe people
> >> are mostly running old versions of and contributing little to;
> >> Microsoft is running weekly updated versions of csswg-test and Gecko
> >> is running a several-year-old version with no realistic plan to update
> >> it, nobody contributes that much (a tiny subset of Gecko stuff is
> >> automatically synced, but the vast majority is not);
> >> web-platform-tests is run by Microsoft semi-regularly, is run with
> >> two-way syncing from Gecko and Servo, with plans by Blink and
> >> Microsoft to get there AIUI, and with more in the way of contributions
> >> than either of the other two repositories. WebKit just aren't running
> >> anything, far as I'm aware. The only other group I'm aware of running
> >> anything is Prince, running a small subset of an old version of
> >> csswg-test.
> >>
> >> Speaking to people across all browsers about why they're generally
> >> contributing more to web-platform-tests than csswg-test, there's a
> >> more or less uniform answer: there's too much friction. Metadata and
> >> review are the two parts held up as reasons; metadata because it means
> >> that existing tests can't simply be added to the testsuite (and you're
> >> then having to justify far more time to release the tests), review
> >> because often enough comments come back months later by which time
> >> everyone working on it has moved on and doesn't have time to address
> >> minor nits.
> >>
> >> I went through all of the all of the metadata in
> >>
> >> <
> https://lists.w3.org/Archives/Public/public-css-testsuite/2015Oct/0004.html
> >;
> >> there's rough agreement on removing much of what we currently have.
> >> That said, I think it's worthwhile to reiterate that requiring *any*
> >> metadata causes friction. Tests written by browser vendors are rarely
> >> a file or two which is quick to add metadata too. I know in general
> >> people seem interested in using the same infrastructure to run both
> >> web-platform-tests and csswg-test, which essentially requires the
> >> metadata required to run the tests be identical across the two.
> >>
> >> The other significant complication when it comes to csswg-test is the
> >> build system. Because of the necessity of building the testsuite first
> >> it makes it more complicated to fix tests when they're failing; you
> >> have to know how to find the source file and then be able to build the
> >> testsuite after fixing it. Historically the build system existed to
> >> deal with the variety in what UAs support, whether they supported HTML
> >> and/or XHTML; this is a far smaller deal nowadays—of the UAs running
> >> the CSS testsuite or likely to do so in the future, the only one I'm
> >> aware of that doesn't support both well is Servo (and that's likely to
> >> change so can possibly be ignored here).
> >>
> >> This is, of course, all complicated by the need to be able to
> >> demonstrate that CR exit criteria have been met. For things in
> >> web-platform-tests, as I understand it, there's normally just been a
> >> copy of one directory that's been used to create the CR-exit
> >> testsuite. Aside from this, it's assumed that tests don't need to be
> >> run on a per-spec basis: yes, some things will get missed when
> >> building the CR-exit testsuite because they're in another directory
> >> because they're testing how multiple specs interact, but it's not
> >> worth the extra complexity to handle this, given the loss is normally
> >> small.
> >>
> >> The other notable difference is in tooling: Mercurial is used with a
> >> git mirror, and then reviews are done split across Shepherd and
> >> public-css-testsuite and some issues filed on GitHub, and with some
> >> people expecting some pre-landing review through GitHub PRs and with
> >> some people pushing directly to Mercurial… Really everything would be
> >> simpler if we had *one* single way to do things. I'd much rather have
> >> everything on GitHub, review happening on PR submission, and nits and
> >> such like be reported as GitHub issues. This keeps everything in one
> >> place with tools most people are used to.
> >>
> >> To outline what I'd like to see happen:
> >>
> >> - Get rid of the build system, replacing many of it's old errors with
> >> a lint tool that tests for them.
> >> - Policy changes to get rid of all metadata in the common case.
> >> - Change the commit policy. (Require review first, require no new lint
> >> errors.)
> >>
> >> Long-term it's probably worth considering merging the whole thing into
> >> web-platform-tests, so we have all the W3C tests in one place.
> >>
> >> I realise this omits a lot of detail in a fair few places: I'm just
> >> trying to start off with something not *too* ginormous. :)
> >>
> >> /gsnedders
> >>
> >
>

Received on Thursday, 24 March 2016 19:02:49 UTC