- From: Michael[tm] Smith <mike@w3.org>
- Date: Thu, 3 Nov 2011 09:00:19 +0900
- To: public-test-infra@w3.org
http://www.w3.org/2011/10/28-testing-minutes.html
28 Oct 2011
Agenda
http://lists.w3.org/Archives/Public/public-test-infra/2011OctDec/0014.html
Attendees
Present
Jeane_Spellman, Bryan_Sullivan, Wilhelm_Anderson,
James_Graham, Elika_Etemad, Jason_Leyba, Simon_Stewart,
Kris_Krueger, John_Jansen, Peter_Linss, Mike_Smith,
Alan_Stearns, Narayana_Babu_Maddhuri, Duane_O'Brien,
Charlie_Scheinost, Ken_Kania, Jeff_Hammel, Clint_Talbert,
Tab_Atkins, Michael_Cooper, Philippe_Le_Hgaret
Chair
Wilhelm_Andersen
Contents
* Topics
1. Introductions
2. Agenda Overview
3. WebDriver API
4. Testing IE
5. Testing Firefox
6. Testing Opera
7. Testing in the CSS WG
8. Testing Chrome
9. krisk_
10. testharness.js
11. How should we organize public test suites so that they
are as easy as possible to contribute to and reuse?
12. Additional Items
13. Conclusions and Action Items
* Summary of Action Items
_________________________________________________________
<MichaelC_SJC> scribeNick: MichaelC_SJC
Introductions
wa: testing helps everybody
figure out how to make best possible test suites
<plh> Wilhelm: I'd like to figure how to make the best possible test
suite, how to make the Web better
I work for Opera as testmonkey, test manager
in various parts
jg: also work for Opera
<missed the rest>
ee: also known as fantasai
work on testing in CSS WG
jl: work on testing in Google
want to improve the ecosystem so it all works better
ss: created Webdriver, working Selenium
very aware of the differences between browsers, would love to sort
it out
kk: worked in testing at Microsoft
more recently on Web standards
jj: also at Microsoft
interested in automation, test suites
pl: co-chair of CSS WG
have contributed extensively to that test suite
and working on test shepherd for <missed>
ms: work for W3C, staff contact to HTML WG
work on testing for HTML, extensive contributions to framework
as: working for Adobe
interested in tests working across browsers
represent Nokia
nm: learn what's up
do: <missed>
<MikeSmith> https://browserlab.adobe.com/en-us/index.html <-
Adobe BrowserLab
https://browserlab.adobe.com/en-us/index.html
cs: represent adobe
<simonstewart> Ken_Kania
kk: work for google, Webdriver
bs: AT&T, mobile data services
interoperability in various fora
want to understand the challenges browser vendors have in automation
and how to leverage tools in repeatable continuous framework
to certify new devices as they come out, get updated, etc.
jh: Mozilla, test automation
ct: Mozilla, testing
ta: Google, work on Chrome
not as closely involved in testing, but have worked in CSS on some
<plh> involed in WAI. zstaff contact for PF, developping ARIA> we're
struggling in testing. hoping to contribute to the test framework
<plh> ... we have reuirements that we'd like to bring as well
plh: W3C, Interaction Domain, lots of your favourite groups
want a common framework, common way to write tests
Agenda Overview
wa: first, want browser vendors to introduce how they do testing
then, presentations of a few testing approaches
finally, discussion of how to write tests for different types of
functionality
90% of tests cover how something rendered to screen in a particular
way
or script returns an expected result
or user fills out form and certain result
WebDriver API
ss: WebDriver is an API for automation of WebApps
developer-focused, guides people to writing better tests
Merged with Selenium a couple years ago
fairly simple, load page, find element, perform actions like focus,
click, read, etc.
kk: does it simulate user input at driver level, or elsewhere?
ss: in past user interactions were done by simulating events in DOM
but browers inconsistent in how they handle those
when they do what etc.
so events at script level not feasible
so do events at OS level
that is high fidelity but terrible machine utilization
and wastes developer's time
so now, allow window not to have focus and send events via various
OS APIs
but OS not designed to send high fidelity user input to background
window
so now, Opera and Chrome pump events into event loop of browser
<scribe not sure that was caught right>
Webdriver has become a de facto standard for browser automation
most popular open source framework
as can be seen by job postings requiring familiarity with it
has reasonable browser support
Opera, Chrome, and Android add-on, Mozilla starting
uses Apache2 license
business-friendly license
nm: tried on mobile browsers?
ss: yes, in various <lists>
it's a small team
covering wide range of browsers and platforms
see 3 audiences for automation
1) App developers are vast majority
need to test applications
hard to get developers to write tests, and can only get them to
write to one API when you get it at all
first audience for WebDriver
2) browser vendors
desire to automate their testing as much as possible
bs: how does Webdriver related to qunit <sp?>
ss: <didn't catch details>
bs: so Webdriver isn't a framework, it's an API for automating
events
ss: clearly a browser automation API
e.g., understand Opera runs 2 million tests / day with this
3) Spec authors
some specs can be articulated entirely in script
and tested that way
others need additional support, this provides that
ee: more spec testers than authors?
ss: yes, those focusing on test aspects
... user perspective
it's a series of controlled APIs
to interrogated DOM
execute script with elevated priveleges
and provide APIs to interact, so not just read-only
jj: <question missed>
ss: <answer missed>
jj: avoids cross origin vulnerability?
ss: yes
bs: good, some complicated scenarious
ss: implementer view
neutral to transport and encoding
provide JSON
which bring clients that can handle immediately
also have released JavaScript APIs
ss: Security
<JohnJansen> My question was regarding the bypass of the x-origin
security restriction
ss: automation and security are opposite concerns
<JohnJansen> answer: the jscript still honors that restriction,
though webdriver itself ignores it.
generally, build support into browser
and enable it via an additional component
or command line features
ss: Demo
<shows short script, then executes>
kk: how Opera?
ss: Watir on top of WebDriver
... API designed to be extensable
expose capabilities via a simple interface or casting
jj: How are visual verifications handled?
ss: can take a screenshot, platform-dependent
Opera has extended with ability to get hash of the screenshot
attempt to capture entire area described by DOM, not just viewport
deals with difficulties like fixed positioning etc.
but very browser specific
jj: human comparison mechanism?
ss: in google, teams of people do that
we just provide the mechanism
don't want to over-prescribe how to process images, as state of the
art continually changes
bs: to compare layout between different browsers
capture screens, or query position of elements?
ss: can do both
can get location of an element
and size
bs: how about different screens sizes
interested in specifically how things rendered in various
circumstances
ss: the locatable interface can provide various types of measures
kk: differences among browsers are wide for many reasons
it's part of the landscape
ss: was able to use same tests using same APIs
at rendering level can be different
plh: platform AAPIs use similar services
hope e.g., ARIA can use WebDriver
ss: have looked at AAPIs, can look at elements by ARIA role etc.
on relationship to AAPIs
sometimes they're enough, sometimes not
one of the next big things in hybridized apps, part native and part
Web
may need to use AAPIs to test
plh: think ARIA can be tested using this
ss: have applied Webdrive to native app testing using AAPIs
kk: there has been a path starting with MSAA
ss: AAPIs are extremely low-level
e.g., a combobox is represented as a few different controls together
kk: developers create all kinds of crazy things
so UI automation allows patterns
mc: can speak to AAPI from WebDriver
ss: Webdriver sits on top of AAPI
but because of script interface, could talk back and forth a bit
wa: Opera has a layer "Watir" on top of WebDriver
<shows sample>
test file looks like a manual test, e.g., a human could interact
with it
<demos manual execution of test>
<that can also be executed using the script showed previously>
for each test file, there's a block in the automation script
ss: Webdriver simlilar
nm: <missed>
ss: <answer related to webelement.gettext>
jj: why wrapping in Watir
wa: was done before projects had merged
now doesn't matter as much
plan to submit Opera set of tests to HTML WG for official test suite
but want them in a format other browser vendors could use
Opera uses Ruby bindings, Mozilla uses Python bindings
need to automate in all browsers, Webdriver seems way to go
for official W3C tests, question of what language binding to use?
ss: Javascript is hugely known
Python is the other one being explored by Mozilla and Chrome
also is "politically unencumbered"
vs some other candidates out there
<MikeSmith> I vote for Javascript
wa: how complete are JS bindings?
js: still finalizing
kk: <something detailed>
js: API stable
loading script within browser is the part that still needs working
on, to get around sandbox
it's usable now, but have debugging etc. to do
ss: so maybe Python preferable?
jg: having dependency on core could be a big stability issue
<^ not sure that's scribed right>
kk: dangerous to build on things that are changing
otoh, need bindings to be something that's available on all targets
ss: normally test and browser communicate like a client / server
can do over a web socket
and run test on machine independent of browser
wa: was able to test a mobile device on a different continents this
way
plh: if we set up a test server on W3C site, could you allow it to
just run tests at you?
ss: can connect from browser to a test server
so in theory, this works
but security concerns
need a manual intervention to put browser in testing mode
mc: have to trust W3C server from security POV
how we allow tests to be contributed needs to be careful
<general view of usefulness of this approach>
as: <missed>
<JohnJansen> as: is there support for IME? how good is it?
ss: support varies by platform as we prioritize development
<mentions wherefores and whynots>
do support internationalized text input
for testing I18N but could be used to test other stuff
do: how well documented is JS API?
ss: fairly extensive
<jhammel>
http://code.google.com/p/selenium/wiki/JsonWireProtocol
http://code.google.com/p/selenium/wiki/JsonWireProtocol
Facebook developed PHP bindings using this documentation
Selenium stuff hosted under software freedom conservancy
can use w/o the open source stuff, but also handy to use the open
source stuff
wa: Just started browser tools and @@ WG
<jhammel> http://www.w3.org/2011/08/browser-testing-charter
http://www.w3.org/2011/08/browser-testing-charter
primary goal is to standardize Webdriver API at W3C
<jhammel> (i think)
welcome you all to join to make this happen
also want to explore whether all browser vendors can handle official
test suites using Webdriver API
ss: aware of support from Google, Opera, Mozilla
explicit non-support from Microsoft, Apple, Nokia, HP
also support from RIM
plh: would Microsoft be able to accommodate tests using this?
kk: depends
standardization of the API will help a lot
<Another link for the WG is http://www.w3.org/testing/browser/>
http://www.w3.org/testing/browser/%3E
also need tests structured in certain ways we can work with
<fantasai> kk: having the tests be self-describing is very
important. If I was a TV browser vendor that doesn't support
webdriver, I would want to be able to leverage the W3C tests as well
jg: tests always structured so you could run manually, though would
be ridiculous to do so with them all in practice
ms: first thing we need is a spec
doesn't matter where editors draft hosted, can do at W3C
IP commitments kick in when we publish a Working Draft
ss, wa: ready to move right away on that
kk: W3C would own code?
ss: W3C would maintain spec
and a reference implementation
but there could be other implementations
mc: reference implementation doesn't necessarily have to be W3C
plh: spec is most important for W3C
ss: all Google testing in some way related to Webdrive
bs: supported in mobile?
ss: chrome and android
wa: also opera for mobile
bs: so other platforms is just lack of implementation?
ss: right; Nokia and Apple haven't implemented
just need a driver
kk: support IE6? want to get rid of that
ss: drop support when usage drops below a certain level
plh: support from Microsoft for Webdriver API will help HTML WG a
lot
jj: even if Opera submits tests and HTML adopts, they're
self-describing so still testable manually
plh: what does Nokia think?
nm: Nokia not really interested
focused on Webkit stuff
today is first time hearing about it
ss: it's not just about testing a spec, it's about ensuring users
can use content in your browser
so that market force should drive interest even if internal interest
is elsewhere
nm: how is performance?
ss: rapid on Android, but slow on emulator
Iphone is fast directly and in emulator
<something else> fast
nm: <missed>
<jhammel> ^ pixel verification
ss: haven't seen a lot of pixel verification on mobile devices
<scribe having a hard time hearing or understanding remainder of
discussion>
<MikeSmith> agenda:
http://lists.w3.org/Archives/Public/public-test-infra/2011OctDec
/0014.html
http://lists.w3.org/Archives/Public/public-test-infra/2011OctDec/0014.html
<dobrien> Could we get the minutes updated again as well please?
jj: propose not requiring webdriver in first version of test suite
<bryan> Scribenick: bryan
Testing IE
kk: To walk thru testing of IE
... shows slides "Standards and Interoperability"
<fantasai> IE testing diagram: Standards, Customer Feedback,
Privacy, Accessibility, Performance, Security
<fantasai> (these are pictured as hexagrams around a central
"Internet Explorer" label)
kk: IE testing has various chunks as shown on the slide (slides to
be shared)
<fantasai> "Internet Explorer Testing Lab" w/ photo
<fantasai> IE5 -> IE10
<fantasai> 948 Workstations
<fantasai> 119 servers
<fantasai> 1200 virtual machines
<fantasai> remotely configurable
<fantasai> 152 versions of IE shipped every "Patch Tuesday"
<fantasai> Green Lab Initiative saves ~218 tons of CO2/Year
kk: IE testing lab using a lot of machines with a lot of IE versions
tested every week
<fantasai> "Standards Engagement"
<fantasai> ECMA
<fantasai> TC39 (Ecmascript 5)
<fantasai> W3C
<fantasai> - CSS
<fantasai> -WebApps
<fantasai> -HTML
<fantasai> -SVG
<simonstewart> Slides for the webdriver notes:
https://docs.google.com/present/edit?id=0AVrYfCxRNKUGZGc5Nm1ocGh
fNzFnaGd2bmZnYw
https://docs.google.com/present/edit?id=0AVrYfCxRNKUGZGc5Nm1ocGhfNzFnaGd2bmZnYw
<fantasai> -XML
<fantasai> cycle diagram: Testing -> spec editing -> implementations
-> (loop back to Testing)
<fantasai> "Standard Contributions"
<fantasai> - Spec editing
<fantasai> -co-chairing
<fantasai> -test case contributions w3c and ecma
kk: encourage standards engagement and participation in various
groups
<fantasai> -- 14623 tests submitted
<fantasai> -- across IE9/IE9/IE10 features
<fantasai> - hardware (Mercurial server)
<fantasai> - IE Platform Preview Builds
kk: have contributed a lot of tests and hardware
... preview builds allow early access and feedback
<fantasai> "IE10 Standards Support"
<fantasai> CSS2.1 , 2D Transofrms, 3D Transforms, Animations,
backgroudns and Borders, Color, Flexbox, Fonts, Grid alignment,
hyphenation, image values gradients, media querie,s multi-col,
namespaces, OM Views, positioned floats, selectors, transitions
Value sand Units
<fantasai> DOM element traversal, HTML, L3 Core, L3 Events, Style,
Traversal and Ragne
<fantasai> ECMASCRIPT 5
<fantasai> File Reader API
<fantasai> FIle Saving
<fantasai> FormData
<fantasai> Geolocation
kk: IE 10 will support a lot of standards CSS, HTML5, Web APIs, ...
http://ietestdrive.com
http://ietestdrive.com/
<fantasai> HTML5 appcache, asycn cavnas, drag and drop, forms and
validation, structure clone, history API, parser sandbox, selection,
semantic element,s video and audio
<fantasai> ICC Color profiles
<fantasai> Indexed DB
<fantasai> Page Visibliity
<fantasai> Selectors API L2
<fantasai> SVG Filter Effects
<fantasai> SVG standalone and in HTML
kk: also look at the IE blog
<fantasai> Web Sockets
<fantasai> Web Workers
<fantasai> XHTML/XML
<fantasai> XMLHttpREquest L2
<fantasai> "Items for Discussion"
<fantasai> * WG Testing Inconsistent
<fantasai> - when are test created? Before LC? CR?
<fantasai> - Whena re tests reviewd?
<fantasai> - vendor prefixes
<fantasai> - 2+ impl passing test srequired for CR/
<fantasai> * Review Tools (none)
kk: issues are inconsistent testing across WGs
<fantasai> Note -- that's not quite true anymore, plinss wrote one
for csswg :)
kk: when tests are created e.g. related to last call or earlier
... soft rules for how a spec is allowed to progress are maybe not
enough
plh: these are soft rules currently
jj: test tools recently developed have helped with consistency,
flushing our remaining inconsistencies is a goal
... different test platforms result in different tests as submitted
to W3C
Michael_Cooper: experience has convinced that tests should be
available by last call
Kris_Krueger: why would this not be a rec across W3C?
plh: its not easy to enforce
... some WGs will complain
jj: amping the expectations on testing will help
mc: it should be the rule, with exceptions allowed
<Zakim> MichaelC_SJC, you wanted to say I now believe tests need to
be ready by Last Call
Elika_Etemad: implementations are needed to see how tests are
working
James_Graham: the process does not map to browser development
reality
Elika_Etemad: its difficult to say when spec development is done
thus making a hard deadline
<dobrien> @
<dobrien> Mhmv @7
John_Jansen: problems often cause the specs to move backward
<dobrien> Sorry about that.
Elika_Etemad: CR is test the spec phase, not fixing bugs in browsers
... having to move CR back due to bugs is an issue, we need an
errata process to allow edits in CR
plh: we are not here to fix the W3C process
John_Jansen: the more times you go thru the circle
(edit/implement/test) the better, and also the earlier
James_Graham: when we implement we write the tests... test suites
should not be closed
<fantasai> James_Graham: The state of the spec is irrelevant to when
we write tests
Mike_Smith: the Testing IG is scoped broadly perhaps too much so.
The IG will decide what its products will be, e.g. a best practice
on when test suites are developed.
... writing this down even if we do not fix the process will help
others avoid the same mistakes of the past
... it will still have some value
Wilhelm_Anderson: how do you run tests, what is automated, is
development inhouse
Kris_Krueger: write our own tests
plh: from JQuery?
Kris_Krueger: no, customer feedback is also considered
... e.g. Gmail support provides feedback
... have a lot of automated tests, ship every Tuesday, and get quick
feedback from users/developers
Narayana_Babu_Maddhuri: is there any review of the test cases to
determine is the test a valid test, validation of the test results?
plh: the metadata of the test log should clarify what is being
tested
Kris_Krueger: pointing to where the test relates to the spec is
helpful
plh: we cannot force metadata into tests, but we can encourage this
info to help ensure test value clarity
Narayana_Babu_Maddhuri: good reporting would be helpful
plh: knowing e.g. what property works across devices and platforms
is a goal, and matching tests to specs would support that
James_Graham: knowing why something is failing is sometimes
difficult, dependencies are not clear and why the test failed is
unclear
<plh> [lunch]
<MichaelC_SJC> == Lunch break is 1 hour ==
<ctalbert_>
http://people.mozilla.org/~ctalbert/automationpresentation/Autom
ation.html
http://people.mozilla.org/~ctalbert/automationpresentation/Automation.html
Testing Firefox
<krisk_> Firefox Testing Presentation
<krisk_> clint: Tools automation lead at Mozilla
<krisk_> Clint: overview of their testiong
<krisk_> Grown over the years
<krisk_> Test Harnesses
<fantasai> "Automation Structure: Test Harnesses"
<fantasai> - C++ Unit
<krisk_> C++ Unit testing, XPCShell, no too intresting for this
group
<fantasai> - XPCShell (javascript objects)
<fantasai> - Reftest
<fantasai> -Mochitest
<fantasai> -UI Automation Frameworks
<fantasai> - Marionette
<krisk_> Mochitest - tests dom stuff
<krisk_> New UI automation framework - Marionette
<krisk_> Reftest drill down
<fantasai> "Reftest: style and layout visual comparison testing"
<fantasai> Reference: <p><b>This is bold</b></p>
<fantasai> Test: <p style="font-weight: bold">This is bold</p>
<fantasai> clint: The test and the reference create the same
rendering in different ways.
<fantasai> clint: Then we take screenshots and compare them pixel by
pixel
<fantasai> clint: Mochitest is an HTML file with some javascript in
it.
<fantasai> clint: One of the libraries it pulls in is the SimpleTest
library.
<fantasai> clint: It has the normal asserts: ok, is, stuff to
control whether asynchronous or not
<fantasai> clint: This other file here (in this example) turns off
the geolocation security prompts
<fantasai> clint shows a geolocation test
<jhammel> ^
http://mxr.mozilla.org/mozilla-central/source/dom/tests/mochites
t/geolocation/test_allowWatch.html
http://mxr.mozilla.org/mozilla-central/source/dom/tests/mochitest/geolocation/test_allowWatch.html
<fantasai> plh: How does this route around the security checks?
<fantasai> clint: uses an add-on
<fantasai> clint: has a special powers api
<fantasai> "Marionette: Driving Gecko into the future"
<fantasai> This is a mechanism we can use to drive any gecko-based
application either by UI or by inserting scrit actions into its
various script contexts.
<fantasai> How it works -
<fantasai> 1. socket opened from inside gecko
<fantasai> 2. Connect to socket from test harnes, either local ro
remote
<fantasai> 3. Send JSON protocol to it
<fantasai> 4. Translates JSON protocol into browser actions
<simonstewart> uses webdriver json protocol streamed over sockets
directly
<fantasai> 5. Send results back to harness in JSON
<jhammel> wiki page:
https://wiki.mozilla.org/Auto-tools/Projects/Marionette
https://wiki.mozilla.org/Auto-tools/Projects/Marionette
<jhammel> (WIP)
<fantasai> clint: We run all of these test on every check in every
tree we build on.
<fantasai> clint: Goes into a dashboard
<fantasai> slide: shows screenshot of TinderboxPushLog
<fantasai> wilhelm: Can we steal your Mochitests? What do we need to
do to do so?
<fantasai> clint: Check them out of the tree and see how well they
run in Opera
<fantasai> clint: Some of the stuff we did, e.g. special powers
extension,
<fantasai> clint: but it's now a specific API (used to be scattered
randomly throughout tests)
<fantasai> clint: If you had something similar and named it
specialpowers, then you could use that to get into your secure
system
<fantasai> clint: So should be possible.
<fantasai> clint: A lot of tests we have in the tree are completely
agnostic; don't do anything special at all, should work today
<jhammel> mochitests are at
http://hg.mozilla.org/mozilla-central/file/tip/testing/mochitest
http://hg.mozilla.org/mozilla-central/file/tip/testing/mochitest
<fantasai> wilhelm: Are there plans to release these tests to
geolocation wg?
<fantasai> clint: I think they already did. guy wrote tests is on
that wg
<fantasai> kk: ... they're hard-coded to use the Google service. If
you don't use it, they don't run...
<fantasai> kk: Not too many though
<fantasai> some discussion of sharing tests
<fantasai> Alan: I think WebKit is using some Mozilla reftests, but
not using them as reftests
<fantasai> kk: I'm fine w/ reftests. But of course won't work for
everything.
<fantasai> kk: CSS tests we wrote are self-describing.
<fantasai> Alan: do you have automation?
<fantasai> kk: Yes
<fantasai> rakesh: Do you run the tests every day?
<fantasai> clint: Every checkin
<fantasai> clint: Different trees run different numbers of tests.
<jhammel> https://tbpl.mozilla.org/
https://tbpl.mozilla.org/
<fantasai> clint: Our goal is to have test results back within 2
hours. Right now we're averaging 2.5hrs
<fantasai> fantasai: You're responsible for watching the tree and
backing out if you broke something.
<fantasai> discussion of test coverage
<fantasai> discussion of subsetting tests during development
<fantasai> wilhelm: How much noise do you have?
<fantasai> clint: Don't know about false positives
<fantasai> clint: Probably not many; once we find one, we check for
that pattern elsewhere
<jhammel> orange factor, for tracking failures:
http://brasstacks.mozilla.com/orangefactor/
http://brasstacks.mozilla.com/orangefactor/
<fantasai> clint: Thing we really have is intermittent failures
<fantasai> clint: We're trying really really hard to bring it down
<fantasai> clint: Used to be on every checkin you'd get, on average,
8 intermittent failures
<fantasai> clint: we pushed it down to 2
<fantasai> clint: And then we added the Android tests
<fantasai> clint: trying to bring it down again
<fantasai> duane: Can I instrument Marionette today in FF7?
<fantasai> clint: No, code we're depending on now is landing
currently on Nightly
<fantasai> clint: Released probably... May?
<fantasai> clint: Depending on work done by Developer Tools group
<fantasai> clint: They have a remote debugging protocol they're
implementing
<fantasai> clint: Will be really nice; decided this would be great
to piggyback on. Don't need two sockets in lower-level Gecko.
<fantasai> clint: So won't be available until that's released.
<fantasai> clint: Currently in a project repo... land in Nightly in
~2.5 weeks
<fantasai> plh: Marionnet is only for Fennec, not for desktop
version?
<fantasai> clint: For Fennec right now. Planning to go backwards and
use for Desktop as wel.
<fantasai> clint: My goal is to move all our infrastructure towards
that
<fantasai> kk asks about reducing orange
<fantasai> clint: It's mostly a one-by-one effort of fixing the
tests
<simonstewart> Interesting comment about avoiding using setTimeout
in tests
<fantasai> kk: Are you going to take Mochitests into W3C? Anything
preventing you?
<fantasai> clint: Nothing right now. We'd have to clean them up and
make them cross-browser. Good for everyone, not opposed, j ist a
matter of finding people and time
<fantasai> jgraham: there's a bug on making testharness.js look like
Mochitest to Mozilla
Testing Opera
<fantasai> "This looks vaguely familiar"
<fantasai> wilhelm: Say a few words about testing at Opera
<fantasai> wilhelm: We have a mainline, which is supposedly always
stable, and then when we're developing a feature, it gets branched
and at some point tests start passing (that's the yellow, b/c out of
sync with mainline) and then we merge and that becomes mainline
<fantasai> diagram shows mainline with six green dots going forward
<fantasai> branch goes off, two red dots, one yellow
<fantasai> arrow from mainline to green dot on feature branch
<ctalbert_> The wiki page we(mozilla) wrote that details our
"lessons learned" from fixing intermittently failing tests is here:
https://developer.mozilla.org/en/QA/Avoiding_intermittent_orange
s
https://developer.mozilla.org/en/QA/Avoiding_intermittent_oranges
<fantasai> arrow from green dot back to green dot on mainline
<fantasai> jgraham: ...
<fantasai> jgraham: Our setup's a bit different
<fantasai> jgraham: All the tests are in subversion in their own
repository that's separate from the code. It's just a normal
webserver: apach, php
<fantasai> jgraham: When you ask for tests to be run, they get
assigned from the server and we send them out to a couple hundred
virtual machines
<fantasai> jgraham: not quite MSFT's setup
<fantasai> jgraham: And then we store every result of every test
<fantasai> jgraham: I think you just store did all the tests past..
we store, in this build this test passed.
<fantasai> jgraham: We have a huge database of this information
<fantasai> jgraham: Theoretically we can delete stuff, but we store
everything.
<fantasai> jgraham: In a mainline build from yesterday, we ran
quarter of a million tests
<fantasai> jgraham: That's not quarter million files -- it's 60,000
files, some of which produce multiple results
<fantasai> jgraham: e.g. some tests from HTML5 test in W3C, one file
might produce 10,000 results
<fantasai> jgraham: Typically it's a JS thing and it just runs a
bunch of code and at the end it has some results
<fantasai> jgraham: Dumps them to the browser in some way
<fantasai> jgraham: The way we do that right now is pretty stupid,
so I won't talk about it
<fantasai> slide: Visual tests, JS tests, Unit tests, Watir tests,
Manual tests :(
<fantasai> jgraham: System was designed 7 years ago or sth
<fantasai> jgraham: For visual tests, you just take a screenshot,
and then we store the screenshot.
<fantasai> jgraham: Someone manually marks whether that screenshot
was a pass or fail.
<fantasai> jgraham: Don't do that. You have to do it once per test,
and then once any time anything changes very slightly
<fantasai> jgraham: e.g. introduce anti-aliasing test, have to
re-annotate all tests
<fantasai> jgraham: this format is deprecated
<fantasai> wilhelm: We have 20,000 tests on 3 different Opera
configurations...
<fantasai> wilhelm: We want to kill these tests and use reftests
instead
<fantasai> jgraham: Oh, reftests should be on that list too
<fantasai> jgraham: Recently we implemented reftests, and we're
actively trying to move tests to reftests.
<fantasai> jgraham: You can't test everything with reftest, but when
you can it's much better
<fantasai> Alan: Do you keep track of when the reference file bitmap
changes?
<fantasai> Alan: What if both the reference and the test change
identically such that the test should fail but doesn't?
<fantasai> plinss: In the case of the CSSWG when we have a fragile
reference, we have multiple references that use different techniques
<fantasai> jgraham: We have a very lightweight framework we used to
use for JS tests. Only allowed one test per page.
<fantasai> jgraham: Easy to use, but required a lot of convoluted
logic for each pass/fail result.
<fantasai> jgraham: For new test suites, we're using testharness.js
<fantasai> jgraham: similar to Mozilla's MochiKit
<fantasai> jgraham: Unit tests are C++ level things not worth
talking about here
<fantasai> jgraham: When things need automation, we use Watir --
discussed this morning
<fantasai> jgraham: When all else fails, we have manual tests
<fantasai> wilhelm: Notice that the monkey looks really unhappy
<fantasai> jgraham: For the core of Opera, we schedule a test day
and just run tests
<fantasai> plh: How many manually tests do you have?
<fantasai> wilhelm: around 2000 before, less now...
<fantasai> wilhelm: Probably spend about a man-year on manual tests
per year
<fantasai> wilhelm: Say some things about challenges we have, things
we need to take into account when writing tests internally and for
W3C
<fantasai> wilhelm: First thing is device independence
<fantasai> wilhelm: We run 3 different configurations of Opera:
Desktop profile, Smartphone profile, and TV profile
<fantasai> wilhelm: Almost every time someone requests a build, it
will be tested on those three profiles
<fantasai> wilhelm: We notice that if you have a static timeout in
your test, e.g. wait 2s before checking result, that will break on
stupid profile with low resources
<fantasai> wilhelm: On some platforms we automatically double or
triple it, and we hope it works, but it's not really good solution
<fantasai> jgraham: How do you deal with ... ?
<fantasai> clint: we time out our tests after a set time period and
mark it as failed
<fantasai> jgraham: Most assumption is don't depend on device size
or speed -- test will randomly fail.
<fantasai> wilhelm: Brings me to the next problem: random
<fantasai> wilhelm: If you have so many tests and even small
percentage fail randomly, going to spend man-years investigating
those failures
<fantasai> wilhelm: When we add new configurations, when we steal
tests from source of unknown quality, we spend many man-years
stamping out randomness in the tests
<fantasai> wilhelm: The more complex the test, the more likely to
randomly fail
<fantasai> wilhelm: Simplest tests are JS.
<fantasai> wilhelm: For imported tests from random sources, could be
very bad
<fantasai> wilhelm: Then comes visual tests
<fantasai> wilhelm: Sometimes complexity is needed, but if can
simplify will do that
<fantasai> wilhelm: We have a quarantine system: run 200 times on
test machines first to make sure its good
<fantasai> wilhelm: Still, sometimes things slip through.
<fantasai> wilhelm: We steal your tests. Thank you.
<fantasai> slide: jQuery, Opera, Chrome, Microsoft, mozilla, W3C
<fantasai> wilhelm: Keeping in sync with the origin of the test is
difficult
<fantasai> wilhelm: When someone updates a test elsewhere, w don't
automatically get that
<fantasai> wilhelm: When we muck about the test to get it to work on
our system, we have to maintain patches
<fantasai> wilhelm: If we fix bad tests, sometimes easy to
contribute back, but sometime not
<fantasai> wilhelm: Automating tests to use our Watir scripts, can
also become a problem.
<fantasai> wilhelm: Our current approach is not usable
<fantasai> wilhelm: need a better way for us all to keep in sycn
<fantasai> kk: This is why we have submitted and approved folders
<fantasai> jgraham: The problem from our POV is really .. part of it
is version control problem on our
<fantasai> end
<fantasai> jgraham: Don't have a good way to keep our patches
separate from upstream changes
<fantasai> jgraham: If we have w3C tests, and we pull new version,
don't have a way to say "these are bits we changed ot make it work
on our version"
<fantasai> jgraham: ... reporting and script file separate
<fantasai> jgraham: if we pull some tests from Mozilla, say, and
they're JS engine tests andthey update them, if we try and merge
them.. someone has to work out how to do that by hand. It's kindof a
nightmare.
<fantasai> wilhelm: Last thing about randomness, esp imported
<fantasai> wilhelm: Some tests rely on external tests.
<fantasai> wilhelm: Great when we only had a few tests
<fantasai> wilhelm: But now it's a problem. Servers go down, etc.
<fantasai> wilhelm: Conclusion there is: don't do that. :)
<fantasai> wilhelm: That's it!
<fantasai> jhammel: Wrt upstream tests, standardizing on formats and
standardizing on process
<fantasai> wilhelm: We set up time at 3:15 today to discuss this
exact issue
<fantasai> mc: You say you have to fix tests to work on your
product.
<fantasai> mc: Question is how do you separate fixing test to be not
random, vs. making them work on a particular product
<fantasai> jgraham: When we pull in tests, we try not to change
anything to do with the test.
<fantasai> jgraham: We don't require the tests to pass to be in our
system.
<fantasai> jgraham: The thing we need to change is, can this test
report back to our servers.
<fantasai> jgraham: But external tests are usually not designed that
way.
<fantasai> wilhelm: I think testharness.js approach is good, because
those are separated.
<krisk_> That is the end of Opera
<MichaelC_SJC> 's presentation
<krisk_> The next person up is peter from HP on css wg update (10
minutes)
<krisk_> Then a discussion on rendering tests for about 1 hour
Testing in the CSS WG
<krisk_> test.csswg.org
<krisk_> has lots of information on CSS WG testing
<krisk_> Tests are 'built' from xml into multiple formats - html,
xhtml, etc...
<krisk_> Test harness is a wrapper around the tests that are loaded
in an iframe
<krisk_> It loads the tests that have the least number of tests
<krisk_> The harness has a filter for spec section, etc..
<krisk_> The harness has meta-data description for each of the tests
<stearns> test format requirements:
http://wiki.csswg.org/test/css2.1/format
http://wiki.csswg.org/test/css2.1/format
<krisk_> The harness also has test results that can be shown for
each of the browser/engine versions
<krisk_> Build process has requirements that will be improved
overtime - meta data, ref test, title, etc...
<krisk_> Adding meta-data helps review process, though most
submitters don't like to add this data
<krisk_> Multiple refs for the same test exist and a negative ref
test as well
<krisk_> You can have two ref tests if the spec has two different
results - for example margin collapsing
<krisk_> If a ref test can't be used then in some cases a
self-describing test works
<plinss> http://test.csswg.org/annotations/css21/
http://test.csswg.org/annotations/css21/
<krisk_> Spec annotations are used that map back to the annotated
spec
<krisk_> The annotated spec has total tests and results for each
section of the spec
<krisk_> Now on to the test review system
<krisk_> http://test.csswg.org/shephard/
http://test.csswg.org/shephard/
<krisk_> Very tight coupling to the css test metadata
<krisk_> Tracks history and other information about a test case
<krisk_> jgraham: is this tied to the test file?
<krisk_> peter: no it's possible to have this information in another
file
<krisk_> jgraham: can this handle a case when multiple files are
used to create alot of tests
<krisk_> peter: yes we have the same issue for the media query test
cases
<krisk_> Wilhelm: So does css still use visual non-ref tests?
<krisk_> fantasi: for css3 we require ref-tests, so no
<Alexia> http://b39b5112.thesegalleries.com
http://b39b5112.thesegalleries.com/
<plh> s|http://b39b5112.thesegalleries.com||
http://b39b5112.thesegalleries.com/
<krisk_> peter: The system is built to save time and automate parts
<krisk_> peter: for example when a test is approved it is moved from
submitted to approved
<krisk_> Michael: Does the system have access control checks for
approval?
<krisk_> peter: yes
Testing Chrome
<krisk_> Ken: Chrome Testing Information
<simonstewart> kk: works on the chrome automation team
<simonstewart> kk: not an automation group in the same sense as
mozilla
<simonstewart> chrome depends on webkit
<krisk_> kk is not krisk
<simonstewart> webkit layout tests, pixel-based tests
<simonstewart> kk == ken_kania
<simonstewart> kk: dom dump tree tests
<simonstewart> kk: not got a lot of insight into the specifics of
the webkit tests. Focuses mainly on the chrome browser
<simonstewart> kk: couple of layers of testing
<simonstewart> kk: lowest layer is the c++ browser tests
<simonstewart> kk: probably more than other browsers do. Special
builds of chrome which will run C++ in the ui thread
<simonstewart> kk: relatively low level, though
<simonstewart> kk: beyond those, there are the ui test framework.
Based on the automation proxy (AP)
<simonstewart> kk: ap is pretty old, but is an ipc mechanism
<simonstewart> kk: very much internal facing
<simonstewart> those tests are still fairly low level, depsite being
called ui tests
<simonstewart> kk: higher than that, Ken's team work on something
called the chrome bot
<simonstewart> kk: runs on real and virtual machines
<simonstewart> kk: cache of a large number of sites in a cache.
Often used for crash testing. Also include tests that perform random
ui actions
<simonstewart> kk: a little bit smarter than pure random, but that's
the gist
<simonstewart> kk: qa level tests. Tests that are done by manual
testers. Piggy back off the ui test automation framework. things
ilke creating bookmarks, installing extensions, etc
<simonstewart> kk: break down manual testing to test parts. First
app compat. Push a new release of chrome it continues to work, and
testing chrome at the ui level
<simonstewart> Most of the ui is "based on the web"
<simonstewart> For the chrome specific native widgets there are
manual tests
<simonstewart> kk: app compat depends on webdriver
<simonstewart> kk: lots of google teams depend on webdriver to
verify that sites work.
<simonstewart> kk: guess that at a high level, the testing strategy
tends to be developer focused.
<simonstewart> kk: devs should write the tests in whatever tool and
harness is most expedient for their purpose
<simonstewart> kk: piggy back a lot on the fact that chrome does
rapid releases. 4 channels release to users (canary, dev, beta,
stable)
<simonstewart> kk: different release schedules
<simonstewart> kk: depend a lot on user feedback from the canaries
<simonstewart> kk: that's the gist of it
<simonstewart> tab: sounds good to me
<simonstewart> jhammel: do chrome do performance testing?
<simonstewart> kk: we do. Using the AP and the ui testing framework
mentioned earlier
<simonstewart> http://build.chrome.org
http://build.chrome.org/
<simonstewart> to view the tests that have been run
<simonstewart> plh: do we run jquery tests
<jhammel> ^ correction: http://build.chromium.org
http://build.chromium.org/
<simonstewart> kk: not really. webkit guys might, and we pick that
up
<simonstewart> krisk_: do you create tests and feed them back
<simonstewart> TabAtkins: we don't do much, but we do
<simonstewart> krisk_: is that because it doesn't fit with the
systems
<simonstewart> TabAtkins: the ways we write and run tests isn't
really compatible with the existing w3 systems.
<simonstewart> TabAtkins: would like to change that!
<simonstewart> TabAtkins: some tests are html/js. which might be
used where possible. Doesn't ahppen that regularly
<simonstewart> krisk_: how do you know that you're interoperable?
<simonstewart> TabAtkins: in terms of webkit stuff, it's a case of
testing being done by different browser vendors
<simonstewart> kk: lots of c++ tests that are specific to chrome
<jhammel> simonstewart: np :)
<simonstewart> krisk_: v8?
<simonstewart> TabAtkins + kk: v8 team live in europe. Who knows?
<simonstewart> wilhelm: also has legacy stuff for opera. New tests
written in a way that (in theory) is usable outside. Can chrome do
the same thing?
<simonstewart> TabAtkins: will agitate for that. Involved in spec
writing rather than active dev, so might be tricky
<simonstewart> wilhelm: This is a great forum to raise those issues.
Opera happy to share with Chrome if Chrome does the same :)
<simonstewart> krisk_: do chrome try and pass a bunch of the w3c
test suites?
<simonstewart> TabAtkins: yes. Some of the might be integrated into
the chromium waterfall. Some of them might be run by hand
<simonstewart> ?? does anyone know about webkit testing
<simonstewart> TabAtkins: the people who'd I'd like to ask aren't
around
<simonstewart> webkit does seem to take in test suites from mozilla.
They're running against a bitmap that's different from the moz
rendering
<simonstewart> TabAtkins: we don't have a good infrastrcuture for
ref tests
<simonstewart> TabAtkins: the test infrastructure people _do_ want
to fix that
<simonstewart> TabAtkins: every time a new port is added to webkit,
there are more pixel tests. Provides pressure to do better
<simonstewart> plh: any other questions?
<simonstewart> 15 minute break coming up
Info available from webkit: https://trac.webkit.org/wiki
https://trac.webkit.org/wiki
also see http://www.webkit.org/quality/testing.html
http://www.webkit.org/quality/testing.html
<krisk_> Next agenda Item jgraham talking about testharness.js
<MichaelC_SJC> scribe: testharness.js
<MichaelC_SJC> scribe: krisk_
krisk_
testharness.js
<MichaelC_SJC> s/topic: krisk_//
<fantasai> scribenick: fantasai
jgraham: testharness.js is something I wrote to run tests.
... It runs JS tests specifically
... It's a bit like MochiTest or QUnit which JQuery uses, or various
things
<plh> --> http://w3c-test.org/resources/testharness.js
testharness.js
http://w3c-test.org/resources/testharness.js
jgraham: Every JS framework has invented its own testharness
... This has slightly different design goals
... The overarching goal is that it's something we can use to test
low-level specs like HTML and DOM
... So it can't rely on lots of HTML and DOM :)
... The design goals were to provide some API for writing readable
and consistent tests
in JS
jgraham: Our previous harness at Opera, as I mentioned, didn't resul
in very readable
tests
jgraham: The other is to support testing the entire DOM level of
behavior
... There are 2 test types : asynchronous tests and synchronous
tests
... second us purely syntactic sugar
... Another design goal was to allow possibility of the test to have
multiple assertions, and all have to be true for test to pass
... typical example might be checking that some node has a set of
children.
... Might want to first test for any children before testing that
4th child is a <p>
... Multiple tests per file was a requirement; learning from Opera's
1/file, which was painful for test writers and discouraged many
tests
... ... runs everything in try-catch blocks
... One feature of that is that every bit of the test is like a
function, basically
... it tries to handle some housekeeping.
... if you have 1000 tests in a file, nice if you can time out those
tests individually
... Uses settimeout(); can override that if you want, e.g. if
running on slow hardware
... and a design goal was easy integration with browsers' existing
test systems
... Should be easy to use on top of MochiKit or whatever you use for
reporting results
... next thin I thought I'd do is go through creating a test.
jgraham's text editor:
<script src="resources/testharnessreport.js"></script.
<script src="resources/testharness.js"><script>
<div id="log"></div>
jgraham: By default testharnessreport.js is blank. It's for you to
integrate into your testing system.
... the order is not at the moment relevant
... we might later check in testharness.js that testharnessreport.js
was included
added to file:
(at the top)
<title> Dispatching custom events</title>
(at the bottom)
<script>
var t = async_test("Custom event dispatch");
</script>
jgraham: Each test has a number of tests, and each step is a
function that gets called
... It gets called inside a try-catch block, and we can check if the
test failed. We don't put anything as top-level code.
(added at the bottom)
t.ste(function() {
(ok, that's too much to type)
jgraham: Here it's adding an event listner before the second step
... When it gets called, it'll cal lthis other function here, which
will run this other step, which is another function. Can get a bit
verbose.
... There's a convenience method that will make this easier.. all
documented in testharness.js
... Simple assert_equals() with value we get, value we expect, and
then you can optionally have a string that describes what it is
you're asserting.
... At this point everything we want done is done, so we say
t.done();
... If you load this in a browser, because we have div#log, it will
show whether it passes or fails and what assert failed
<plh> -->
http://w3c-test.org/webapps/ElementTraversal/tests/submissions/W
3C/Element-childElementCount.html Example of testharness.js
http://w3c-test.org/webapps/ElementTraversal/tests/submissions/W3C/Element-childElementCount.html
jgraham: That's all
jj: Is there an id on the steps, so that you can say you failed step
4 of test foo?
jgraham: If there's demand, there could be a second argument there.
jj: would be nice to know where it failed so I can set a breakpoint
there
jgraham: If you get a huge number of tests per file, it's usually
auto-generated
... if it's failing in an assert, then it'll tell you which assert
failed
plh shows his example
plh: everything shown here is generated by testharness.js
jgraham: There's a failure in this, and it seems everyone fails
that.
plh: Bug in testharness.js
jj: Easiest way to debug the test. Is there an error in the test,
error in testharness.js, or error in browsers
jgraham: There are various types of assertions. Usually corresponds
to webIDL
... But what's in webIDL isn't always the same
kk: It's pretty well-written, only 700 lines or so
clint: If it's synchronous, you don't have to do t.step()
jgraham: A test that is synchronous implicitly creates a step
wilhelm: Opera currently uses this tool for all the new tests that
we write. Can others use this?
clint: Yeah, I think so
kk: There use to be some nunit or something that W3C had
... Was in IE, but some browsers couldn't run it.
... Very complicated
[server problems]
plinss: Are tests grouped by section into files?
jgraham: In this case, it checks reflection section, plus section of
each part of the spec that defines a reflected attribute
topic change
wilhelm: plh wanted to talk about test harness, fantasai wanted to
talk about syncing problem
How should we organize public test suites so that they are as easy as
possible to contribute to and reuse?
htp: //w3c-test.org/framework/
MikeSmith: This is an instance of the framework peter demoed
Mike: I'm going to show you what has been added here to make it
easier for test suite maintainers to add data to the system.
... There's this area called Maintianer Login
... It'll give you an http_auth, which authenticates against W3C's
user database
... Email me if you want access to the system
... Once you go in there you'll see 2 options: add metadata, change
metadata
... Can add a specification
... one early piece of feedback I got was they have tests they want
to run that are not associated with a spec.
... So in this instance of the system, it's not a requirement to
have a spec for your test suite
... You can give it an arbitrary ID as long as not a duplicat
... Title of the spec
... URL for the spec
... It expects you'll point it to a single-page version of the spec
... If you have a multi-page spec, don't point it at the TOC. You
need the full version of the spec.
... Could change later, but initially set up this way 'cuz easier
... This will get added to the list here
... Next thing you can do is needed if you want to do what Peter was
demoing earlier, which was associating testcases with specific
sections of the spec -- or specific IDs in the spec
... Structured around idea that you put your IDs per section
... But some WGs like WOFF WG they're putting assertions at the
sentence level
... They don't actually have section titles, so needed to
accommodate that too
Peter: Alan and fantasai did some work on that, too.
... Shepherd tool will be able to parse out spec to find test
anchors
... and then can report testing coverage of the spec, so this is
something we will automate
Alan: What fantasai and I worked out was based on WOFF work, but
will be simpler for spec editors. A bit harder to automate, though
Mike: This part add spec metadata.
... Instead of a form to fill out, it lists existing specs in the
system
... once you go here, if there's already data in the system, will
show you data in the system alread
... otherwise it'll show you generated data
... This parses the spec and pulls out the headings. If it looks ok,
you press submit
... It'll put these section titles into the database.
... If you have IDs below the section title level, then you'll have
to use a different way to get it into the DB
... You might have to get me to do it for now :)
... Those steps are optional right now.
... What is necessary is going in and giving info about the test
suite itself.
... you can give it an arbitrary ID
... Title, longer description
... to explain better thet est suite
... base URL of where your test suites are stored
... Difference from CSS is, that one requires format subdirectories
plinss: it's optional
Mike: This one doesn't expect subdirectories. Expects all tests in
this one directory
... If you have separate subdirectories...
... Need to make different test suites or ...
... Simplest case you have all tests in one directory
plinss: The code's actually a lot more flexible wrt formats. We'll
talk offline.
MikeSmith: Then you have contact information for someone who can
answer questions about test suites
... Then you indicate format of the test suite
... Then you have a list of flags, you can select which ones
indicate optional tests
... There are ways to add flags to the system
... No ui for it, so contact me
... Last thing you then do is upload a manifest file
... You have to have a test suite
... You select a test suite
... and then what I have it do right now is that you need to point
it to the url for a manifest file, and it'll grab that and read it
in
... Right now two forms of manifest files that it will recognize
... second one here is just a TSV that expects path/filename,
references, flags, links, assertions
... links are the spec links
... The other big change is, I was talking with some people e.g.
annevk and ms2ger
... the format they're using is just listing the filenames
... it marks support files as support files
kk: Mozilla guys wanted to know what files were needed to pull to
run a test case
plinss: In the CSSWG, the large manifest file with metadata -- that
gets built by the build system
MikeSmith: This form expects the full filename, not just the
extensionless filename
... Because that's what they had
... Once you have that, you should be able to get your test cases
into the test database
... and it'll show up on the welcome page
... Long way to go on this.
... Goal when I started on this was to get it to the point where I
didn't have to manually do INSERT in SQL to get specs into the
database
... What would be really nice is if ppl start using this and getting
more test suites in there so that we can ..
plinss: But right now only limited set of ppl can contribute to that
code
MikeSmith: I created two groups in our database
... I created a group for developers -- anyone who wants to
contribute to framework
... That'll give you write access to hg repo for the source code for
this
... Take a look at source code and see problems, send me patches or
I'll give you direct access
... Second thing is if you want to have access to use this UI to
submit test suite data, I'll have to add you to a particular group
fantasai: how is this code related to plinss's code?
MikeSmith: It's forked from that.
... I've just been pulling the upstream changes
... been able to merge everything without it breaking.
... Think it's in good enough shape that we could port it back
upstream
plinss: This system and the Shepherd share a lot fo the same base
code
... Lots of things I was going to port Shepherd system back into
this system, and then pull your stuff in too
... Mike also has code that ties into the testharness.js code, and
will automatically submit results from that
MikeSmith: If you go to enter data, it gives you some choices about
whether you want to run full test suite or not
... There's a button here that will pull automatic results where
possible
... Be careful, this will submit the data publicly!
jgraham: Not saying it's a bad idea, but from our POV, we're not
going to use it offline.
(Brian was talking about trying out the system privately offline)
plinss: The system tracks who's submitting the data. By login if
you're logged in, by IP if not
Brian: Privacy is useful
plinss: goal is for pulling data from as may sources as possible
wilhelm: fantasai wanted to talk about keeping things in sync
<dobrien> Is someone scribing? I can't keep up on the iPad
<ctalbert_> This is the writeup that we are planning to set up at
Mozilla for the CSS tests specifically:
https://wiki.mozilla.org/Auto-tools/Projects/W3C_CSS_Test_Mirror
ing
https://wiki.mozilla.org/Auto-tools/Projects/W3C_CSS_Test_Mirroring
<krisk_> Mozilla has a way to move tests from mozilla -> w3c ->
mozilla
<ctalbert_> wilhelm: how will this cope with local patches?
<krisk_> fantasi: The master copy only lives in one place...
<ctalbert_> jgraham: probably not a problem with the css tests
<krisk_> fantasi: approved is the master in w3c
<krisk_> fantasi: submitted is the master for submissions
<ctalbert_> jgraham: opera is thinking of having the master from w3c
which is intact, and our checkout from that master will have the
local patches, and when we pull we'll rebase our patches atop the
w3c master
<ctalbert_> this should be possible now that hg is in the w3c side
and our (opera) side
<ctalbert_> fantasai: we'll probably have to do something similar
<krisk_> wilhelm: how does this handle local patches?
<ctalbert_> jhammel: is there a technical limitation to not have
people editing the w3c tests
<ctalbert_> fantasai: no
<krisk_> fantasi: this is only for css which don't seem to have this
problem
<ctalbert_> jgraham: probably make it a commit hook
<ctalbert_> ctalbert_: agreed
<ctalbert_> peter: if someone pushes to the approved directory
without actually being approved then the system just automatically
denies them
<ctalbert_> that may be incorrect ^ (scribe error)
<ctalbert_> wilhelm: might be an idea to split test suites down at
lower granularity levels so that you can have test suites with
differnt levels of maturity
<ctalbert_> jgraham: don't think that would make a difference tbh
<ctalbert_> peter: our repo would keep all the data from all the
suites in the repo so that our build system could build any version
of them from any suite
<ctalbert_> wilhelm: are there other things we can do to make it
easier to contribute test suites?
<ctalbert_> fantasai: one problem on the mozilla side - there's no
place to put tests that should go to the w3c - we depend on a manual
process to sort out which should be submitted and then it is done
later
<ctalbert_> fantasai: these tests just sit in a random place and are
forgotten
<ctalbert_> fantasai: once we have a directory that goes to w3c and
we tell the reviewers, then it will help quite a bit.
<ctalbert_> fantasai: the basic idea is to make the process obvious
what developers need to do with that test to indicate that it is
appropriate and ready for w3c then it should "just happen"
<ctalbert_> jgraham: we have a similar problem. it's hard to surface
those tests and bugfixes without a policy and a place for those
tests
<ctalbert_> peter: if we have a standard format among the test
writers then it will be easier to help developers to upload the
tests to the w3c. If the developers have to convert the tests it's
too difficult and people won't expend the effort to make it happen
<ctalbert_> krisk_: sometimes it depends on the editors as to when
they allow tests into the spec, and you find that tests sometimes
lag the spec by quite a bit
<ctalbert_> fantasai: we found that with the css - the person
writing the spec is often nominally tasked with also writing the
test suite but because the skill sets are different and the spec
editor is usually swamped, then the tests get neglected
<ctalbert_> fantasai: we really need a dedicated person to manage
these tests and testing effort for each spec
<ctalbert_> MikeSmith: is there some way to motivate people to do
that?
<ctalbert_> MikeSmith: maybe we should publicly track the testsuite
owner?
<ctalbert_> fantasai: we can do that, but the burden is on getting
resources for that, really.
<ctalbert_> MikeSmith: yeah, the question is how do you encourage
the managers allow their people to spend times on w3c work
<ctalbert_> MichaelC_SJC: you might be able to convince your company
to do that, but we also need to have the working group chairs
understand that this needs to happen
<ctalbert_> jgraham: if we have them already in an interoperable
format then it's pretty easy, but for our existing tests that are in
a different format, we aren't going to spend the time to convert
them
<ctalbert_> fantasai: we might just have a place at w3c to take
those tests, and just post them publicly and have someone else do
the conversion work
<ctalbert_> jgraham: I suspect that's a wide problem
<ctalbert_> krisk_: if you getin the habit of submitting stuff as
you're doing development, tat seems reasonable.
<ctalbert_> krisk_: keeping things not super complex is a win, and
being consistent will pay dividends
fantasai^: Because for Opera it may not be valuable to do the
conversion, but e.g. Microsoft might want those tests, and decide
that the cost of converting is less than the cost of rewriting tests
from scratch, so to them it'll be worth it to do the conversion
<ctalbert_> fantasai: thanks, I'm not too good at this :/
<ctalbert_> (scribe note ^)
<ctalbert_> wilhelm: the more I think of this, the more I realize
that facilitating the handover of tests is a full time job
<Zakim> MichaelC_SJC, you wanted to ask how much should there be a
"W3C format" vs how much does W3C framework need to format (nearly)
any format?
<ctalbert_> wilhelm: if we could get every browser vendor to commit
one person to do this work on their team then that would be good.
<ctalbert_> fantasai: the problem we're at now, people havne't
adopted the w3c ofrmats internally
<ctalbert_> it will be less work once that happens
<ctalbert_> it's not w3c's responsibility to convert your tests to
w3c
<ctalbert_> fantasai: you can write a conversion script to convert
your test to w3c format
<ctalbert_> better to do that than to have w3c to accept all
differnt formats
<ctalbert_> jgraham: the problem is that many of these harnesses are
not built for portability
<ctalbert_> MichaelC_SJC: the problem with a common format (and I
may be wrong) is that you run into things you can't test
<ctalbert_> jgraham: if we run into that, then in that case maybe we
can find some lightweight format for those tests, or in that case
maybe we use a different type of harness
<ctalbert_> scribe: ctalbert has to step out
<ctalbert_> fantasai: ^
scribe:
kk: If you can write it with testharness.js, do that. If not, try
reftest, if not, try self-describing test
... In your case you have the difficulty of needing a screenreader
or something
...
jgraham: If you can get ppl to contribute in one format, at least
you solve the problem once per platform rather than once per test
mc: I think there's a hierarchy of goodness
... The framework should have at least thepossibility of hooking in
new formats
general agreement
wilhelm: For the Watir cases, we noticed areas where we'd want to
addtests for something very obscure and specific. What we've done is
add support at a low level in Opera and use an API
... Such things could be later added to WebDriver
<MichaelC_SJC> s/I think there's/I can agree with the idea that/
Alan: For tests where there isn't a w3c version, but browsers have
something, is there a list of most-wanted specs that need tests on
the w3c site
fantasai: All of them? :)
Alan: We were talking about poking ppl, committing ppl to
translating browser tests to w3c tests
... Would be more successful to getting resources if we have a
specific list of things we need
jj: Also possibility to ask specific people.
... Rather than saying, please call all submit tests for HTml5
... Say, can you submit tests for WebWorkers
... need a specific ask to get things done
... It might not cause immediate surge in test submissions, but for
me from outside to inside, the idea of submitting tests was
impossible to me. Didn't know where to submit them, figured they'd
be rejected, didn't know what a reftest was, etc.
... So process was hard, and not being specific
... Better way to get things done is asking
... Would like Opera to submit WebWorker tests
wilhelm: Can I get that in writing so I can show it to my manager?
Alan: Identify the tests, see who has those tests, then request them
plh: We've been corrsponding on testing framework a little bit, but
part of task is also going out there in the wild and finding tests
and getting them to W3C
... Need to get to point where we have framework and start on asking
tests
Alan: Use framework to identify areas, since it annotates the spec
jj: We have no idea how much coverage those 47 tests have -- number
isn't meaningful from a coverage perspctive
... 1 is better than 0, but maybe 100 is needed not47
?: Test coverage is a negative covered only know when something is
not covered, not how well something is covered
jj: Even if you say you have 100% on that normative statement, still
doesn't tell you if you got all the edge cases
jgraham: At the moment for HTML we have nothing, though.
<simonstewart> ^^ simonstewart: test coverage is a negative thing.
It'll only say what's not covered, not how well the covered areas
are tested
jgraham: We have our tests organized by section in the repo, but
it's not explicit
... Being able to say per normative statement, do we have a test for
this, is pretty nice
<plh> --> http://www.w3.org/2011/10/timer.html (annoying) timer
http://www.w3.org/2011/10/timer.html
jgraham: If you look somewhere, there's an annotation per sentence
in the spec showing tests for section X
... But that's really complicated, because spec isn't marked up to
make that easy
... and testing dozens of disconnected statements
kk: The problem we're struggling with is not that how do we get
perfect coverage. There's a spec, and there's no coverage.
... Browsers all have this feature, and they don't work the same. So
having some is a good start.
Bryan: If you look at most of WebAPIs near LC or at LC, only 1/3
have tests available
<jhammel> fantasai: setup a process for getting tests from *your*
organization to w3c, and *going forward*, you should write
w3c-submittable tests *and* submit the tests. Once that is in place,
we can go back and convert legacy tests
<plh> s/corrsponding/working/
<jhammel> fantasai: we need to get the webkit people to commit to
this
<jhammel> fantasai: you can require that when checked into repo,
they become reftests
<jhammel> fantasai: plan going forward is to convert to reftest
<jhammel> jgraham: if you're comparing to something bitmap-based, it
may take 2x time, but it will save time going forward
fantasai^: Because then the number of legacy tests that are not
w3c-formatted stops growing, and we can work on making that number
smaller
Additional Items
example of a test that has to be self-describing: This tests that
the blurring algorithm produces results within 5% of a Gaussian blur
http://test.csswg.org/source/contributors/mozilla/submitted/css3
-background/box-shadow/box-shadow-blur-definition-001.xht
http://test.csswg.org/source/contributors/mozilla/submitted/css3-background/box-shadow/box-shadow-blur-definition-001.xht
bryan: We developed a number of specs for device APIs
... We recognize these APIs are quite sophisticated, an it'll take
some time, but we're continuing the development of these
capabilities for web runtimes
... We have developer program, global ... ecosystem
bryan (from AT&T): wanted very briefly ...
bryan: show you these links to the specs, the APIs, but more
importantly the test framework
... Test framework is based on QUnit
... Pulls in a file from a test directory, which has the list of
test associated with this particular API.
... Tests individual JS filesin the same directory
... will run them one by one
... This is packaged up as a widget file, whcih is available for
download
... So we can run all the tests for example using this widget
framework.
bryan shows pie charts of resutls
bryan: Automatically uploaded and made available to vendor
plh: Say 1000 tests for core web standards?
bryan: No for APIs
... What comes for underlying platform is inherently tested by that
community
... We need to cover device variation
... identify things that we reference
... We have individual tests for these, test scripts
... this is more than acid level test, but not what we hope to see
from W3C in long run
... We don't want to develop and maintain this level of detail in
WAC. Want to leverage W3C test suites
... If you look at the tests, you can see for example the
geolocation test suite, which we reference.
... We want to auto-generate the tests as widget
jj: So if hte test suite changes, do you update your widget?
bryan: Our goal is to create frameworks where we can pull in tests
and run them in this runtime environment without havng to
necessarily maintain the tests ourselves
... We would benefit from a common test framework
... What exactly these tests are is basically just a JS procedure
... We test existence of methods, call qunit functions for
pass/fail, not necessarily married to this format, but it was the
most common one at the time we developed this.
... So to summarize our goal is to have the scalability to support
this widget-based ecosystem across dozens of devices across the
world
... So we have to have scalability
... To depend on the core standards as something we don't spend a
lot of effort on
... Duplicate things that eventually come from W3C.
... We'd like to see this developed at W3C so we can directly
leverage it.
fantasai comments on how this shows having a few common formats is
better than having w3c accept many similarly-capable formats -- it
better supports reuse of the tests
Conclusions and Action Items
1. Vendors commit to running W3C tests
2. Vendors push internally to adopt W3C test formats
plh says W3C should make ti easier for vendors to import suites
fantasai: what does that entail?
plh: make guidelines for WG
jgraham: I feel the problem is more on our side than on W3C side
wilhelm, jgraham: but of course, using hg instead of cvs is
important for tests
wilhelm: W3C should commit resources to get tests from vendors
plh: start with webapps
wilhelm: Any conclusions on WebDriver discussion?
... We commit to work on the spec, and get that into our browser
plh: MS and Apple should look into that
Mike: normal people at apple are interested, but they're not the
ones who sign off on things
kk: Using testharness.js seems to me a very low-hanging fruit,
rather than writing a whole bunch of APIs
<jhammel> "not buy Apple" would be more effective
wilhelm: There should be a spec that talks about it, for the IP
stuff, we need to get a spec out so there's less risk for those
implementing
jgraham: There was some discussion, but no decision, about which
bindings W3C would accept tests in
wilhelm: I'd list that as an open issue
MikeSmith: We want to follow up with testing IG , [other grou]
s/grou/group/
MikeSmith: Spec discussion would go to [... mailing list ...]
wilhelm: Dumping ground for non-W3C-format tests
kk: You can put whatever you want in submitted folder
<MikeSmith> public-browser-tools-testing@w3.org
jgraham: It would be nice if ppl dump random test suites in random
formats, to separate those out from thing sthat would be approved in
roughly their current form
<MikeSmith>
http://lists.w3.org/Archives/Public/public-browser-tools-testing
/
http://lists.w3.org/Archives/Public/public-browser-tools-testing/
kk: We should have an old_stuff directory
jgraham: And encourage people to dump stuff there
<MikeSmith> for the Testing IG,
http://lists.w3.org/Archives/Public/public-test-infra/ and
public-test-infra@w3.org
http://lists.w3.org/Archives/Public/public-test-infra/
plh: We can associate a repo with the testing IG, and then anyone in
that IG can push to the repo
<plh> ACTION: Mike to create mercurial repositories for Web Testing
IG and Browser Tools WG [recorded in
http://www.w3.org/2011/10/28-testing-minutes.html#action01]
fantasai: Should be clear that dumping things here is not the same
as submitting to an official W3C test suite
bryan: Should also have a wiki that documents what's there
<ctalbert_> TabAtkins_: I accidentally locked myself on the patio,
could you come rescue me?
jj: Right, should be clear these are not submitted for review;
they're there, and someone can take them and convert them and submit
them
<MikeSmith> http://www.w3.org/wiki/Testing
http://www.w3.org/wiki/Testing
jgraham: Come up with a prioritized list of things that need tests
jj: anything that's in CR? :)
plh: I'll take an action item to do that
<scribe> ACTION: plh to make a list of things that need tests
[recorded in
http://www.w3.org/2011/10/28-testing-minutes.html#action02]
bryan: Need a list of what's available, what are the key gaps, what
do we need to get there
kk: Identify specs that are in a bad situation.
fantasai: Also want to track not just what needs testing, but ask
vendors whether they have tests for any of these.
... Can then go pester people to submit those tests
<scribe> ACTION: MikeSmith to Create repos for testing IG and
testing framework group [recorded in
http://www.w3.org/2011/10/28-testing-minutes.html#action03]
plh: Need places to dump tests for groups that don't have repos atm
... more and more groups have their own test repo
<plh> ACTION: plh to convince the geolocation WG to use mercurial
for their tests [recorded in
http://www.w3.org/2011/10/28-testing-minutes.html#action04]
3. Vendors commit to finding a person to facilitate submission and
use of W3C tests
wilhelm: need to make a formal request to each organization
bryan: Someone should pull together format descriptions and include
the guidelines
<plh> --> http://www.w3.org/html/wg/wiki/Testing/Authoring/
Authoring Tests
http://www.w3.org/html/wg/wiki/Testing/Authoring/
dicussion of where to collect this information
<plh> --> http://www.w3.org/testing/ Testing
http://www.w3.org/testing/
jgraham: should be in a place not specific to a given working group
...
plinss: There's a lot to be gained by standardizing metadata
jgraham: hard to do the CSS way for an HTML test
... Could have n ways to do it, where n is a small number
Alan: It would be nice to have everything on a wiki so we don't have
to go through a staff member
... What if this page was a redirect to a wiki?
jgraham: Could have that page be a link to a wiki
MikeSmith: I like redirect idea, minimizes work I have to do :)
wilhelm: So when should we meet again?
jj: I think we should definitely make this a regular meeting.
... Seems like everyone in every WG is going to be solving the same
problems
...
plh: WebDriver will be under browser tools WG
mc: Who's "we"?
wilhelm: I don't know, but this crowd is great.
plh: We can put under the IG
fantasai: We can say at last meet again next TPAC
plh: Would be in France next year
fantasai: Since not everyone will be travelling to TPAC, would we
want to do another place at at different time as well?
jj: Does everyone agree we should meet?
kk: Depends on deliverables.
MikeSmith: If we meet 6 months from now, when would that be?
?: April
mc: Just want to be sure who the "we" is the invite would go out to
wilhelm is designated in charge
Meeting closed.
RRSAgent: make minutes
Summary of Action Items
[NEW] ACTION: Mike to create mercurial repositories for Web Testing
IG and Browser Tools WG [recorded in
http://www.w3.org/2011/10/28-testing-minutes.html#action01]
[NEW] ACTION: MikeSmith to Create repos for testing IG and testing
framework group [recorded in
http://www.w3.org/2011/10/28-testing-minutes.html#action03]
[NEW] ACTION: plh to convince the geolocation WG to use mercurial
for their tests [recorded in
http://www.w3.org/2011/10/28-testing-minutes.html#action04]
[NEW] ACTION: plh to make a list of things that need tests [recorded
in http://www.w3.org/2011/10/28-testing-minutes.html#action02]
[End of minutes]
_________________________________________________________
--
Michael[tm] Smith
http://people.w3.org/mike/+
Received on Thursday, 3 November 2011 00:02:30 UTC