minutes of 2011-10-28 face-to-face meeting from Michael[tm] Smith on 2011-11-03 (public-test-infra@w3.org from October to December 2011)

From: Michael[tm] Smith <mike@w3.org>
Date: Thu, 3 Nov 2011 09:00:19 +0900
To: public-test-infra@w3.org
Message-ID: <20111103000018.GA17519@sideshowbarker>
http://www.w3.org/2011/10/28-testing-minutes.html

28 Oct 2011

   Agenda

       http://lists.w3.org/Archives/Public/public-test-infra/2011OctDec/0014.html

Attendees

   Present
          Jeane_Spellman, Bryan_Sullivan, Wilhelm_Anderson,
          James_Graham, Elika_Etemad, Jason_Leyba, Simon_Stewart,
          Kris_Krueger, John_Jansen, Peter_Linss, Mike_Smith,
          Alan_Stearns, Narayana_Babu_Maddhuri, Duane_O'Brien,
          Charlie_Scheinost, Ken_Kania, Jeff_Hammel, Clint_Talbert,
          Tab_Atkins, Michael_Cooper, Philippe_Le_Hégaret

   Chair
          Wilhelm_Andersen

Contents

     * Topics
         1. Introductions
         2. Agenda Overview
         3. WebDriver API
         4. Testing IE
         5. Testing Firefox
         6. Testing Opera
         7. Testing in the CSS WG
         8. Testing Chrome
         9. krisk_
        10. testharness.js
        11. How should we organize public test suites so that they
            are as easy as possible to contribute to and reuse?
        12. Additional Items
        13. Conclusions and Action Items
     * Summary of Action Items
     _________________________________________________________

   <MichaelC_SJC> scribeNick: MichaelC_SJC

Introductions

   wa: testing helps everybody

   figure out how to make best possible test suites

   <plh> Wilhelm: I'd like to figure how to make the best possible test
   suite, how to make the Web better

   I work for Opera as testmonkey, test manager

   in various parts

   jg: also work for Opera

   <missed the rest>

   ee: also known as fantasai

   work on testing in CSS WG

   jl: work on testing in Google

   want to improve the ecosystem so it all works better

   ss: created Webdriver, working Selenium

   very aware of the differences between browsers, would love to sort
   it out

   kk: worked in testing at Microsoft

   more recently on Web standards

   jj: also at Microsoft

   interested in automation, test suites

   pl: co-chair of CSS WG

   have contributed extensively to that test suite

   and working on test shepherd for <missed>

   ms: work for W3C, staff contact to HTML WG

   work on testing for HTML, extensive contributions to framework

   as: working for Adobe

   interested in tests working across browsers

   represent Nokia

   nm: learn what's up

   do: <missed>

   <MikeSmith> https://browserlab.adobe.com/en-us/index.html <-
   Adobe BrowserLab

      https://browserlab.adobe.com/en-us/index.html

   cs: represent adobe

   <simonstewart> Ken_Kania

   kk: work for google, Webdriver

   bs: AT&T, mobile data services

   interoperability in various fora

   want to understand the challenges browser vendors have in automation

   and how to leverage tools in repeatable continuous framework

   to certify new devices as they come out, get updated, etc.

   jh: Mozilla, test automation

   ct: Mozilla, testing

   ta: Google, work on Chrome

   not as closely involved in testing, but have worked in CSS on some

   <plh> involed in WAI. zstaff contact for PF, developping ARIA> we're
   struggling in testing. hoping to contribute to the test framework

   <plh> ... we have reuirements that we'd like to bring as well

   plh: W3C, Interaction Domain, lots of your favourite groups

   want a common framework, common way to write tests

Agenda Overview

   wa: first, want browser vendors to introduce how they do testing

   then, presentations of a few testing approaches

   finally, discussion of how to write tests for different types of
   functionality

   90% of tests cover how something rendered to screen in a particular
   way

   or script returns an expected result

   or user fills out form and certain result

WebDriver API

   ss: WebDriver is an API for automation of WebApps

   developer-focused, guides people to writing better tests

   Merged with Selenium a couple years ago

   fairly simple, load page, find element, perform actions like focus,
   click, read, etc.

   kk: does it simulate user input at driver level, or elsewhere?

   ss: in past user interactions were done by simulating events in DOM

   but browers inconsistent in how they handle those

   when they do what etc.

   so events at script level not feasible

   so do events at OS level

   that is high fidelity but terrible machine utilization

   and wastes developer's time

   so now, allow window not to have focus and send events via various
   OS APIs

   but OS not designed to send high fidelity user input to background
   window

   so now, Opera and Chrome pump events into event loop of browser

   <scribe not sure that was caught right>

   Webdriver has become a de facto standard for browser automation

   most popular open source framework

   as can be seen by job postings requiring familiarity with it

   has reasonable browser support

   Opera, Chrome, and Android add-on, Mozilla starting

   uses Apache2 license

   business-friendly license

   nm: tried on mobile browsers?

   ss: yes, in various <lists>

   it's a small team

   covering wide range of browsers and platforms

   see 3 audiences for automation

   1) App developers are vast majority

   need to test applications

   hard to get developers to write tests, and can only get them to
   write to one API when you get it at all

   first audience for WebDriver

   2) browser vendors

   desire to automate their testing as much as possible

   bs: how does Webdriver related to qunit <sp?>

   ss: <didn't catch details>

   bs: so Webdriver isn't a framework, it's an API for automating
   events

   ss: clearly a browser automation API

   e.g., understand Opera runs 2 million tests / day with this

   3) Spec authors

   some specs can be articulated entirely in script

   and tested that way

   others need additional support, this provides that

   ee: more spec testers than authors?

   ss: yes, those focusing on test aspects
   ... user perspective

   it's a series of controlled APIs

   to interrogated DOM

   execute script with elevated priveleges

   and provide APIs to interact, so not just read-only

   jj: <question missed>

   ss: <answer missed>

   jj: avoids cross origin vulnerability?

   ss: yes

   bs: good, some complicated scenarious

   ss: implementer view

   neutral to transport and encoding

   provide JSON

   which bring clients that can handle immediately

   also have released JavaScript APIs

   ss: Security

   <JohnJansen> My question was regarding the bypass of the x-origin
   security restriction

   ss: automation and security are opposite concerns

   <JohnJansen> answer: the jscript still honors that restriction,
   though webdriver itself ignores it.

   generally, build support into browser

   and enable it via an additional component

   or command line features

   ss: Demo

   <shows short script, then executes>

   kk: how Opera?

   ss: Watir on top of WebDriver
   ... API designed to be extensable

   expose capabilities via a simple interface or casting

   jj: How are visual verifications handled?

   ss: can take a screenshot, platform-dependent

   Opera has extended with ability to get hash of the screenshot

   attempt to capture entire area described by DOM, not just viewport

   deals with difficulties like fixed positioning etc.

   but very browser specific

   jj: human comparison mechanism?

   ss: in google, teams of people do that

   we just provide the mechanism

   don't want to over-prescribe how to process images, as state of the
   art continually changes

   bs: to compare layout between different browsers

   capture screens, or query position of elements?

   ss: can do both

   can get location of an element

   and size

   bs: how about different screens sizes

   interested in specifically how things rendered in various
   circumstances

   ss: the locatable interface can provide various types of measures

   kk: differences among browsers are wide for many reasons

   it's part of the landscape

   ss: was able to use same tests using same APIs

   at rendering level can be different

   plh: platform AAPIs use similar services

   hope e.g., ARIA can use WebDriver

   ss: have looked at AAPIs, can look at elements by ARIA role etc.

   on relationship to AAPIs

   sometimes they're enough, sometimes not

   one of the next big things in hybridized apps, part native and part
   Web

   may need to use AAPIs to test

   plh: think ARIA can be tested using this

   ss: have applied Webdrive to native app testing using AAPIs

   kk: there has been a path starting with MSAA

   ss: AAPIs are extremely low-level

   e.g., a combobox is represented as a few different controls together

   kk: developers create all kinds of crazy things

   so UI automation allows patterns

   mc: can speak to AAPI from WebDriver

   ss: Webdriver sits on top of AAPI

   but because of script interface, could talk back and forth a bit

   wa: Opera has a layer "Watir" on top of WebDriver

   <shows sample>

   test file looks like a manual test, e.g., a human could interact
   with it

   <demos manual execution of test>

   <that can also be executed using the script showed previously>

   for each test file, there's a block in the automation script

   ss: Webdriver simlilar

   nm: <missed>

   ss: <answer related to webelement.gettext>

   jj: why wrapping in Watir

   wa: was done before projects had merged

   now doesn't matter as much

   plan to submit Opera set of tests to HTML WG for official test suite

   but want them in a format other browser vendors could use

   Opera uses Ruby bindings, Mozilla uses Python bindings

   need to automate in all browsers, Webdriver seems way to go

   for official W3C tests, question of what language binding to use?

   ss: Javascript is hugely known

   Python is the other one being explored by Mozilla and Chrome

   also is "politically unencumbered"

   vs some other candidates out there

   <MikeSmith> I vote for Javascript

   wa: how complete are JS bindings?

   js: still finalizing

   kk: <something detailed>

   js: API stable

   loading script within browser is the part that still needs working
   on, to get around sandbox

   it's usable now, but have debugging etc. to do

   ss: so maybe Python preferable?

   jg: having dependency on core could be a big stability issue

   <^ not sure that's scribed right>

   kk: dangerous to build on things that are changing

   otoh, need bindings to be something that's available on all targets

   ss: normally test and browser communicate like a client / server

   can do over a web socket

   and run test on machine independent of browser

   wa: was able to test a mobile device on a different continents this
   way

   plh: if we set up a test server on W3C site, could you allow it to
   just run tests at you?

   ss: can connect from browser to a test server

   so in theory, this works

   but security concerns

   need a manual intervention to put browser in testing mode

   mc: have to trust W3C server from security POV

   how we allow tests to be contributed needs to be careful

   <general view of usefulness of this approach>

   as: <missed>

   <JohnJansen> as: is there support for IME? how good is it?

   ss: support varies by platform as we prioritize development

   <mentions wherefores and whynots>

   do support internationalized text input

   for testing I18N but could be used to test other stuff

   do: how well documented is JS API?

   ss: fairly extensive

   <jhammel>
   http://code.google.com/p/selenium/wiki/JsonWireProtocol

      http://code.google.com/p/selenium/wiki/JsonWireProtocol

   Facebook developed PHP bindings using this documentation

   Selenium stuff hosted under software freedom conservancy

   can use w/o the open source stuff, but also handy to use the open
   source stuff

   wa: Just started browser tools and @@ WG

   <jhammel> http://www.w3.org/2011/08/browser-testing-charter

      http://www.w3.org/2011/08/browser-testing-charter

   primary goal is to standardize Webdriver API at W3C

   <jhammel> (i think)

   welcome you all to join to make this happen

   also want to explore whether all browser vendors can handle official
   test suites using Webdriver API

   ss: aware of support from Google, Opera, Mozilla

   explicit non-support from Microsoft, Apple, Nokia, HP

   also support from RIM

   plh: would Microsoft be able to accommodate tests using this?

   kk: depends

   standardization of the API will help a lot

   <Another link for the WG is http://www.w3.org/testing/browser/>

      http://www.w3.org/testing/browser/%3E

   also need tests structured in certain ways we can work with

   <fantasai> kk: having the tests be self-describing is very
   important. If I was a TV browser vendor that doesn't support
   webdriver, I would want to be able to leverage the W3C tests as well

   jg: tests always structured so you could run manually, though would
   be ridiculous to do so with them all in practice

   ms: first thing we need is a spec

   doesn't matter where editors draft hosted, can do at W3C

   IP commitments kick in when we publish a Working Draft

   ss, wa: ready to move right away on that

   kk: W3C would own code?

   ss: W3C would maintain spec

   and a reference implementation

   but there could be other implementations

   mc: reference implementation doesn't necessarily have to be W3C

   plh: spec is most important for W3C

   ss: all Google testing in some way related to Webdrive

   bs: supported in mobile?

   ss: chrome and android

   wa: also opera for mobile

   bs: so other platforms is just lack of implementation?

   ss: right; Nokia and Apple haven't implemented

   just need a driver

   kk: support IE6? want to get rid of that

   ss: drop support when usage drops below a certain level

   plh: support from Microsoft for Webdriver API will help HTML WG a
   lot

   jj: even if Opera submits tests and HTML adopts, they're
   self-describing so still testable manually

   plh: what does Nokia think?

   nm: Nokia not really interested

   focused on Webkit stuff

   today is first time hearing about it

   ss: it's not just about testing a spec, it's about ensuring users
   can use content in your browser

   so that market force should drive interest even if internal interest
   is elsewhere

   nm: how is performance?

   ss: rapid on Android, but slow on emulator

   Iphone is fast directly and in emulator

   <something else> fast

   nm: <missed>

   <jhammel> ^ pixel verification

   ss: haven't seen a lot of pixel verification on mobile devices

   <scribe having a hard time hearing or understanding remainder of
   discussion>

   <MikeSmith> agenda:
   http://lists.w3.org/Archives/Public/public-test-infra/2011OctDec
   /0014.html

      http://lists.w3.org/Archives/Public/public-test-infra/2011OctDec/0014.html

   <dobrien> Could we get the minutes updated again as well please?

   jj: propose not requiring webdriver in first version of test suite

   <bryan> Scribenick: bryan

Testing IE

   kk: To walk thru testing of IE
   ... shows slides "Standards and Interoperability"

   <fantasai> IE testing diagram: Standards, Customer Feedback,
   Privacy, Accessibility, Performance, Security

   <fantasai> (these are pictured as hexagrams around a central
   "Internet Explorer" label)

   kk: IE testing has various chunks as shown on the slide (slides to
   be shared)

   <fantasai> "Internet Explorer Testing Lab" w/ photo

   <fantasai> IE5 -> IE10

   <fantasai> 948 Workstations

   <fantasai> 119 servers

   <fantasai> 1200 virtual machines

   <fantasai> remotely configurable

   <fantasai> 152 versions of IE shipped every "Patch Tuesday"

   <fantasai> Green Lab Initiative saves ~218 tons of CO2/Year

   kk: IE testing lab using a lot of machines with a lot of IE versions
   tested every week

   <fantasai> "Standards Engagement"

   <fantasai> ECMA

   <fantasai> TC39 (Ecmascript 5)

   <fantasai> W3C

   <fantasai> - CSS

   <fantasai> -WebApps

   <fantasai> -HTML

   <fantasai> -SVG

   <simonstewart> Slides for the webdriver notes:
   https://docs.google.com/present/edit?id=0AVrYfCxRNKUGZGc5Nm1ocGh
   fNzFnaGd2bmZnYw

      https://docs.google.com/present/edit?id=0AVrYfCxRNKUGZGc5Nm1ocGhfNzFnaGd2bmZnYw

   <fantasai> -XML

   <fantasai> cycle diagram: Testing -> spec editing -> implementations
   -> (loop back to Testing)

   <fantasai> "Standard Contributions"

   <fantasai> - Spec editing

   <fantasai> -co-chairing

   <fantasai> -test case contributions w3c and ecma

   kk: encourage standards engagement and participation in various
   groups

   <fantasai> -- 14623 tests submitted

   <fantasai> -- across IE9/IE9/IE10 features

   <fantasai> - hardware (Mercurial server)

   <fantasai> - IE Platform Preview Builds

   kk: have contributed a lot of tests and hardware
   ... preview builds allow early access and feedback

   <fantasai> "IE10 Standards Support"

   <fantasai> CSS2.1 , 2D Transofrms, 3D Transforms, Animations,
   backgroudns and Borders, Color, Flexbox, Fonts, Grid alignment,
   hyphenation, image values gradients, media querie,s multi-col,
   namespaces, OM Views, positioned floats, selectors, transitions
   Value sand Units

   <fantasai> DOM element traversal, HTML, L3 Core, L3 Events, Style,
   Traversal and Ragne

   <fantasai> ECMASCRIPT 5

   <fantasai> File Reader API

   <fantasai> FIle Saving

   <fantasai> FormData

   <fantasai> Geolocation

   kk: IE 10 will support a lot of standards CSS, HTML5, Web APIs, ...
   http://ietestdrive.com

      http://ietestdrive.com/

   <fantasai> HTML5 appcache, asycn cavnas, drag and drop, forms and
   validation, structure clone, history API, parser sandbox, selection,
   semantic element,s video and audio

   <fantasai> ICC Color profiles

   <fantasai> Indexed DB

   <fantasai> Page Visibliity

   <fantasai> Selectors API L2

   <fantasai> SVG Filter Effects

   <fantasai> SVG standalone and in HTML

   kk: also look at the IE blog

   <fantasai> Web Sockets

   <fantasai> Web Workers

   <fantasai> XHTML/XML

   <fantasai> XMLHttpREquest L2

   <fantasai> "Items for Discussion"

   <fantasai> * WG Testing Inconsistent

   <fantasai> - when are test created? Before LC? CR?

   <fantasai> - Whena re tests reviewd?

   <fantasai> - vendor prefixes

   <fantasai> - 2+ impl passing test srequired for CR/

   <fantasai> * Review Tools (none)

   kk: issues are inconsistent testing across WGs

   <fantasai> Note -- that's not quite true anymore, plinss wrote one
   for csswg :)

   kk: when tests are created e.g. related to last call or earlier
   ... soft rules for how a spec is allowed to progress are maybe not
   enough

   plh: these are soft rules currently

   jj: test tools recently developed have helped with consistency,
   flushing our remaining inconsistencies is a goal
   ... different test platforms result in different tests as submitted
   to W3C

   Michael_Cooper: experience has convinced that tests should be
   available by last call

   Kris_Krueger: why would this not be a rec across W3C?

   plh: its not easy to enforce
   ... some WGs will complain

   jj: amping the expectations on testing will help

   mc: it should be the rule, with exceptions allowed

   <Zakim> MichaelC_SJC, you wanted to say I now believe tests need to
   be ready by Last Call

   Elika_Etemad: implementations are needed to see how tests are
   working

   James_Graham: the process does not map to browser development
   reality

   Elika_Etemad: its difficult to say when spec development is done
   thus making a hard deadline

   <dobrien> @

   <dobrien> Mhmv @7

   John_Jansen: problems often cause the specs to move backward

   <dobrien> Sorry about that.

   Elika_Etemad: CR is test the spec phase, not fixing bugs in browsers
   ... having to move CR back due to bugs is an issue, we need an
   errata process to allow edits in CR

   plh: we are not here to fix the W3C process

   John_Jansen: the more times you go thru the circle
   (edit/implement/test) the better, and also the earlier

   James_Graham: when we implement we write the tests... test suites
   should not be closed

   <fantasai> James_Graham: The state of the spec is irrelevant to when
   we write tests

   Mike_Smith: the Testing IG is scoped broadly perhaps too much so.
   The IG will decide what its products will be, e.g. a best practice
   on when test suites are developed.
   ... writing this down even if we do not fix the process will help
   others avoid the same mistakes of the past
   ... it will still have some value

   Wilhelm_Anderson: how do you run tests, what is automated, is
   development inhouse

   Kris_Krueger: write our own tests

   plh: from JQuery?

   Kris_Krueger: no, customer feedback is also considered
   ... e.g. Gmail support provides feedback
   ... have a lot of automated tests, ship every Tuesday, and get quick
   feedback from users/developers

   Narayana_Babu_Maddhuri: is there any review of the test cases to
   determine is the test a valid test, validation of the test results?

   plh: the metadata of the test log should clarify what is being
   tested

   Kris_Krueger: pointing to where the test relates to the spec is
   helpful

   plh: we cannot force metadata into tests, but we can encourage this
   info to help ensure test value clarity

   Narayana_Babu_Maddhuri: good reporting would be helpful

   plh: knowing e.g. what property works across devices and platforms
   is a goal, and matching tests to specs would support that

   James_Graham: knowing why something is failing is sometimes
   difficult, dependencies are not clear and why the test failed is
   unclear

   <plh> [lunch]

   <MichaelC_SJC> == Lunch break is 1 hour ==

   <ctalbert_>
   http://people.mozilla.org/~ctalbert/automationpresentation/Autom
   ation.html

      http://people.mozilla.org/~ctalbert/automationpresentation/Automation.html

Testing Firefox

   <krisk_> Firefox Testing Presentation

   <krisk_> clint: Tools automation lead at Mozilla

   <krisk_> Clint: overview of their testiong

   <krisk_> Grown over the years

   <krisk_> Test Harnesses

   <fantasai> "Automation Structure: Test Harnesses"

   <fantasai> - C++ Unit

   <krisk_> C++ Unit testing, XPCShell, no too intresting for this
   group

   <fantasai> - XPCShell (javascript objects)

   <fantasai> - Reftest

   <fantasai> -Mochitest

   <fantasai> -UI Automation Frameworks

   <fantasai> - Marionette

   <krisk_> Mochitest - tests dom stuff

   <krisk_> New UI automation framework - Marionette

   <krisk_> Reftest drill down

   <fantasai> "Reftest: style and layout visual comparison testing"

   <fantasai> Reference: <p><b>This is bold</b></p>

   <fantasai> Test: <p style="font-weight: bold">This is bold</p>

   <fantasai> clint: The test and the reference create the same
   rendering in different ways.

   <fantasai> clint: Then we take screenshots and compare them pixel by
   pixel

   <fantasai> clint: Mochitest is an HTML file with some javascript in
   it.

   <fantasai> clint: One of the libraries it pulls in is the SimpleTest
   library.

   <fantasai> clint: It has the normal asserts: ok, is, stuff to
   control whether asynchronous or not

   <fantasai> clint: This other file here (in this example) turns off
   the geolocation security prompts

   <fantasai> clint shows a geolocation test

   <jhammel> ^
   http://mxr.mozilla.org/mozilla-central/source/dom/tests/mochites
   t/geolocation/test_allowWatch.html

      http://mxr.mozilla.org/mozilla-central/source/dom/tests/mochitest/geolocation/test_allowWatch.html

   <fantasai> plh: How does this route around the security checks?

   <fantasai> clint: uses an add-on

   <fantasai> clint: has a special powers api

   <fantasai> "Marionette: Driving Gecko into the future"

   <fantasai> This is a mechanism we can use to drive any gecko-based
   application either by UI or by inserting scrit actions into its
   various script contexts.

   <fantasai> How it works -

   <fantasai> 1. socket opened from inside gecko

   <fantasai> 2. Connect to socket from test harnes, either local ro
   remote

   <fantasai> 3. Send JSON protocol to it

   <fantasai> 4. Translates JSON protocol into browser actions

   <simonstewart> uses webdriver json protocol streamed over sockets
   directly

   <fantasai> 5. Send results back to harness in JSON

   <jhammel> wiki page:
   https://wiki.mozilla.org/Auto-tools/Projects/Marionette

      https://wiki.mozilla.org/Auto-tools/Projects/Marionette

   <jhammel> (WIP)

   <fantasai> clint: We run all of these test on every check in every
   tree we build on.

   <fantasai> clint: Goes into a dashboard

   <fantasai> slide: shows screenshot of TinderboxPushLog

   <fantasai> wilhelm: Can we steal your Mochitests? What do we need to
   do to do so?

   <fantasai> clint: Check them out of the tree and see how well they
   run in Opera

   <fantasai> clint: Some of the stuff we did, e.g. special powers
   extension,

   <fantasai> clint: but it's now a specific API (used to be scattered
   randomly throughout tests)

   <fantasai> clint: If you had something similar and named it
   specialpowers, then you could use that to get into your secure
   system

   <fantasai> clint: So should be possible.

   <fantasai> clint: A lot of tests we have in the tree are completely
   agnostic; don't do anything special at all, should work today

   <jhammel> mochitests are at
   http://hg.mozilla.org/mozilla-central/file/tip/testing/mochitest

      http://hg.mozilla.org/mozilla-central/file/tip/testing/mochitest

   <fantasai> wilhelm: Are there plans to release these tests to
   geolocation wg?

   <fantasai> clint: I think they already did. guy wrote tests is on
   that wg

   <fantasai> kk: ... they're hard-coded to use the Google service. If
   you don't use it, they don't run...

   <fantasai> kk: Not too many though

   <fantasai> some discussion of sharing tests

   <fantasai> Alan: I think WebKit is using some Mozilla reftests, but
   not using them as reftests

   <fantasai> kk: I'm fine w/ reftests. But of course won't work for
   everything.

   <fantasai> kk: CSS tests we wrote are self-describing.

   <fantasai> Alan: do you have automation?

   <fantasai> kk: Yes

   <fantasai> rakesh: Do you run the tests every day?

   <fantasai> clint: Every checkin

   <fantasai> clint: Different trees run different numbers of tests.

   <jhammel> https://tbpl.mozilla.org/

      https://tbpl.mozilla.org/

   <fantasai> clint: Our goal is to have test results back within 2
   hours. Right now we're averaging 2.5hrs

   <fantasai> fantasai: You're responsible for watching the tree and
   backing out if you broke something.

   <fantasai> discussion of test coverage

   <fantasai> discussion of subsetting tests during development

   <fantasai> wilhelm: How much noise do you have?

   <fantasai> clint: Don't know about false positives

   <fantasai> clint: Probably not many; once we find one, we check for
   that pattern elsewhere

   <jhammel> orange factor, for tracking failures:
   http://brasstacks.mozilla.com/orangefactor/

      http://brasstacks.mozilla.com/orangefactor/

   <fantasai> clint: Thing we really have is intermittent failures

   <fantasai> clint: We're trying really really hard to bring it down

   <fantasai> clint: Used to be on every checkin you'd get, on average,
   8 intermittent failures

   <fantasai> clint: we pushed it down to 2

   <fantasai> clint: And then we added the Android tests

   <fantasai> clint: trying to bring it down again

   <fantasai> duane: Can I instrument Marionette today in FF7?

   <fantasai> clint: No, code we're depending on now is landing
   currently on Nightly

   <fantasai> clint: Released probably... May?

   <fantasai> clint: Depending on work done by Developer Tools group

   <fantasai> clint: They have a remote debugging protocol they're
   implementing

   <fantasai> clint: Will be really nice; decided this would be great
   to piggyback on. Don't need two sockets in lower-level Gecko.

   <fantasai> clint: So won't be available until that's released.

   <fantasai> clint: Currently in a project repo... land in Nightly in
   ~2.5 weeks

   <fantasai> plh: Marionnet is only for Fennec, not for desktop
   version?

   <fantasai> clint: For Fennec right now. Planning to go backwards and
   use for Desktop as wel.

   <fantasai> clint: My goal is to move all our infrastructure towards
   that

   <fantasai> kk asks about reducing orange

   <fantasai> clint: It's mostly a one-by-one effort of fixing the
   tests

   <simonstewart> Interesting comment about avoiding using setTimeout
   in tests

   <fantasai> kk: Are you going to take Mochitests into W3C? Anything
   preventing you?

   <fantasai> clint: Nothing right now. We'd have to clean them up and
   make them cross-browser. Good for everyone, not opposed, j ist a
   matter of finding people and time

   <fantasai> jgraham: there's a bug on making testharness.js look like
   Mochitest to Mozilla

Testing Opera

   <fantasai> "This looks vaguely familiar"

   <fantasai> wilhelm: Say a few words about testing at Opera

   <fantasai> wilhelm: We have a mainline, which is supposedly always
   stable, and then when we're developing a feature, it gets branched
   and at some point tests start passing (that's the yellow, b/c out of
   sync with mainline) and then we merge and that becomes mainline

   <fantasai> diagram shows mainline with six green dots going forward

   <fantasai> branch goes off, two red dots, one yellow

   <fantasai> arrow from mainline to green dot on feature branch

   <ctalbert_> The wiki page we(mozilla) wrote that details our
   "lessons learned" from fixing intermittently failing tests is here:
   https://developer.mozilla.org/en/QA/Avoiding_intermittent_orange
   s

      https://developer.mozilla.org/en/QA/Avoiding_intermittent_oranges

   <fantasai> arrow from green dot back to green dot on mainline

   <fantasai> jgraham: ...

   <fantasai> jgraham: Our setup's a bit different

   <fantasai> jgraham: All the tests are in subversion in their own
   repository that's separate from the code. It's just a normal
   webserver: apach, php

   <fantasai> jgraham: When you ask for tests to be run, they get
   assigned from the server and we send them out to a couple hundred
   virtual machines

   <fantasai> jgraham: not quite MSFT's setup

   <fantasai> jgraham: And then we store every result of every test

   <fantasai> jgraham: I think you just store did all the tests past..
   we store, in this build this test passed.

   <fantasai> jgraham: We have a huge database of this information

   <fantasai> jgraham: Theoretically we can delete stuff, but we store
   everything.

   <fantasai> jgraham: In a mainline build from yesterday, we ran
   quarter of a million tests

   <fantasai> jgraham: That's not quarter million files -- it's 60,000
   files, some of which produce multiple results

   <fantasai> jgraham: e.g. some tests from HTML5 test in W3C, one file
   might produce 10,000 results

   <fantasai> jgraham: Typically it's a JS thing and it just runs a
   bunch of code and at the end it has some results

   <fantasai> jgraham: Dumps them to the browser in some way

   <fantasai> jgraham: The way we do that right now is pretty stupid,
   so I won't talk about it

   <fantasai> slide: Visual tests, JS tests, Unit tests, Watir tests,
   Manual tests :(

   <fantasai> jgraham: System was designed 7 years ago or sth

   <fantasai> jgraham: For visual tests, you just take a screenshot,
   and then we store the screenshot.

   <fantasai> jgraham: Someone manually marks whether that screenshot
   was a pass or fail.

   <fantasai> jgraham: Don't do that. You have to do it once per test,
   and then once any time anything changes very slightly

   <fantasai> jgraham: e.g. introduce anti-aliasing test, have to
   re-annotate all tests

   <fantasai> jgraham: this format is deprecated

   <fantasai> wilhelm: We have 20,000 tests on 3 different Opera
   configurations...

   <fantasai> wilhelm: We want to kill these tests and use reftests
   instead

   <fantasai> jgraham: Oh, reftests should be on that list too

   <fantasai> jgraham: Recently we implemented reftests, and we're
   actively trying to move tests to reftests.

   <fantasai> jgraham: You can't test everything with reftest, but when
   you can it's much better

   <fantasai> Alan: Do you keep track of when the reference file bitmap
   changes?

   <fantasai> Alan: What if both the reference and the test change
   identically such that the test should fail but doesn't?

   <fantasai> plinss: In the case of the CSSWG when we have a fragile
   reference, we have multiple references that use different techniques

   <fantasai> jgraham: We have a very lightweight framework we used to
   use for JS tests. Only allowed one test per page.

   <fantasai> jgraham: Easy to use, but required a lot of convoluted
   logic for each pass/fail result.

   <fantasai> jgraham: For new test suites, we're using testharness.js

   <fantasai> jgraham: similar to Mozilla's MochiKit

   <fantasai> jgraham: Unit tests are C++ level things not worth
   talking about here

   <fantasai> jgraham: When things need automation, we use Watir --
   discussed this morning

   <fantasai> jgraham: When all else fails, we have manual tests

   <fantasai> wilhelm: Notice that the monkey looks really unhappy

   <fantasai> jgraham: For the core of Opera, we schedule a test day
   and just run tests

   <fantasai> plh: How many manually tests do you have?

   <fantasai> wilhelm: around 2000 before, less now...

   <fantasai> wilhelm: Probably spend about a man-year on manual tests
   per year

   <fantasai> wilhelm: Say some things about challenges we have, things
   we need to take into account when writing tests internally and for
   W3C

   <fantasai> wilhelm: First thing is device independence

   <fantasai> wilhelm: We run 3 different configurations of Opera:
   Desktop profile, Smartphone profile, and TV profile

   <fantasai> wilhelm: Almost every time someone requests a build, it
   will be tested on those three profiles

   <fantasai> wilhelm: We notice that if you have a static timeout in
   your test, e.g. wait 2s before checking result, that will break on
   stupid profile with low resources

   <fantasai> wilhelm: On some platforms we automatically double or
   triple it, and we hope it works, but it's not really good solution

   <fantasai> jgraham: How do you deal with ... ?

   <fantasai> clint: we time out our tests after a set time period and
   mark it as failed

   <fantasai> jgraham: Most assumption is don't depend on device size
   or speed -- test will randomly fail.

   <fantasai> wilhelm: Brings me to the next problem: random

   <fantasai> wilhelm: If you have so many tests and even small
   percentage fail randomly, going to spend man-years investigating
   those failures

   <fantasai> wilhelm: When we add new configurations, when we steal
   tests from source of unknown quality, we spend many man-years
   stamping out randomness in the tests

   <fantasai> wilhelm: The more complex the test, the more likely to
   randomly fail

   <fantasai> wilhelm: Simplest tests are JS.

   <fantasai> wilhelm: For imported tests from random sources, could be
   very bad

   <fantasai> wilhelm: Then comes visual tests

   <fantasai> wilhelm: Sometimes complexity is needed, but if can
   simplify will do that

   <fantasai> wilhelm: We have a quarantine system: run 200 times on
   test machines first to make sure its good

   <fantasai> wilhelm: Still, sometimes things slip through.

   <fantasai> wilhelm: We steal your tests. Thank you.

   <fantasai> slide: jQuery, Opera, Chrome, Microsoft, mozilla, W3C

   <fantasai> wilhelm: Keeping in sync with the origin of the test is
   difficult

   <fantasai> wilhelm: When someone updates a test elsewhere, w don't
   automatically get that

   <fantasai> wilhelm: When we muck about the test to get it to work on
   our system, we have to maintain patches

   <fantasai> wilhelm: If we fix bad tests, sometimes easy to
   contribute back, but sometime not

   <fantasai> wilhelm: Automating tests to use our Watir scripts, can
   also become a problem.

   <fantasai> wilhelm: Our current approach is not usable

   <fantasai> wilhelm: need a better way for us all to keep in sycn

   <fantasai> kk: This is why we have submitted and approved folders

   <fantasai> jgraham: The problem from our POV is really .. part of it
   is version control problem on our

   <fantasai> end

   <fantasai> jgraham: Don't have a good way to keep our patches
   separate from upstream changes

   <fantasai> jgraham: If we have w3C tests, and we pull new version,
   don't have a way to say "these are bits we changed ot make it work
   on our version"

   <fantasai> jgraham: ... reporting and script file separate

   <fantasai> jgraham: if we pull some tests from Mozilla, say, and
   they're JS engine tests andthey update them, if we try and merge
   them.. someone has to work out how to do that by hand. It's kindof a
   nightmare.

   <fantasai> wilhelm: Last thing about randomness, esp imported

   <fantasai> wilhelm: Some tests rely on external tests.

   <fantasai> wilhelm: Great when we only had a few tests

   <fantasai> wilhelm: But now it's a problem. Servers go down, etc.

   <fantasai> wilhelm: Conclusion there is: don't do that. :)

   <fantasai> wilhelm: That's it!

   <fantasai> jhammel: Wrt upstream tests, standardizing on formats and
   standardizing on process

   <fantasai> wilhelm: We set up time at 3:15 today to discuss this
   exact issue

   <fantasai> mc: You say you have to fix tests to work on your
   product.

   <fantasai> mc: Question is how do you separate fixing test to be not
   random, vs. making them work on a particular product

   <fantasai> jgraham: When we pull in tests, we try not to change
   anything to do with the test.

   <fantasai> jgraham: We don't require the tests to pass to be in our
   system.

   <fantasai> jgraham: The thing we need to change is, can this test
   report back to our servers.

   <fantasai> jgraham: But external tests are usually not designed that
   way.

   <fantasai> wilhelm: I think testharness.js approach is good, because
   those are separated.

   <krisk_> That is the end of Opera

   <MichaelC_SJC> 's presentation

   <krisk_> The next person up is peter from HP on css wg update (10
   minutes)

   <krisk_> Then a discussion on rendering tests for about 1 hour

Testing in the CSS WG

   <krisk_> test.csswg.org

   <krisk_> has lots of information on CSS WG testing

   <krisk_> Tests are 'built' from xml into multiple formats - html,
   xhtml, etc...

   <krisk_> Test harness is a wrapper around the tests that are loaded
   in an iframe

   <krisk_> It loads the tests that have the least number of tests

   <krisk_> The harness has a filter for spec section, etc..

   <krisk_> The harness has meta-data description for each of the tests

   <stearns> test format requirements:
   http://wiki.csswg.org/test/css2.1/format

      http://wiki.csswg.org/test/css2.1/format

   <krisk_> The harness also has test results that can be shown for
   each of the browser/engine versions

   <krisk_> Build process has requirements that will be improved
   overtime - meta data, ref test, title, etc...

   <krisk_> Adding meta-data helps review process, though most
   submitters don't like to add this data

   <krisk_> Multiple refs for the same test exist and a negative ref
   test as well

   <krisk_> You can have two ref tests if the spec has two different
   results - for example margin collapsing

   <krisk_> If a ref test can't be used then in some cases a
   self-describing test works

   <plinss> http://test.csswg.org/annotations/css21/

      http://test.csswg.org/annotations/css21/

   <krisk_> Spec annotations are used that map back to the annotated
   spec

   <krisk_> The annotated spec has total tests and results for each
   section of the spec

   <krisk_> Now on to the test review system

   <krisk_> http://test.csswg.org/shephard/

      http://test.csswg.org/shephard/

   <krisk_> Very tight coupling to the css test metadata

   <krisk_> Tracks history and other information about a test case

   <krisk_> jgraham: is this tied to the test file?

   <krisk_> peter: no it's possible to have this information in another
   file

   <krisk_> jgraham: can this handle a case when multiple files are
   used to create alot of tests

   <krisk_> peter: yes we have the same issue for the media query test
   cases

   <krisk_> Wilhelm: So does css still use visual non-ref tests?

   <krisk_> fantasi: for css3 we require ref-tests, so no

   <Alexia> http://b39b5112.thesegalleries.com

      http://b39b5112.thesegalleries.com/

   <plh> s|http://b39b5112.thesegalleries.com||

      http://b39b5112.thesegalleries.com/

   <krisk_> peter: The system is built to save time and automate parts

   <krisk_> peter: for example when a test is approved it is moved from
   submitted to approved

   <krisk_> Michael: Does the system have access control checks for
   approval?

   <krisk_> peter: yes

Testing Chrome

   <krisk_> Ken: Chrome Testing Information

   <simonstewart> kk: works on the chrome automation team

   <simonstewart> kk: not an automation group in the same sense as
   mozilla

   <simonstewart> chrome depends on webkit

   <krisk_> kk is not krisk

   <simonstewart> webkit layout tests, pixel-based tests

   <simonstewart> kk == ken_kania

   <simonstewart> kk: dom dump tree tests

   <simonstewart> kk: not got a lot of insight into the specifics of
   the webkit tests. Focuses mainly on the chrome browser

   <simonstewart> kk: couple of layers of testing

   <simonstewart> kk: lowest layer is the c++ browser tests

   <simonstewart> kk: probably more than other browsers do. Special
   builds of chrome which will run C++ in the ui thread

   <simonstewart> kk: relatively low level, though

   <simonstewart> kk: beyond those, there are the ui test framework.
   Based on the automation proxy (AP)

   <simonstewart> kk: ap is pretty old, but is an ipc mechanism

   <simonstewart> kk: very much internal facing

   <simonstewart> those tests are still fairly low level, depsite being
   called ui tests

   <simonstewart> kk: higher than that, Ken's team work on something
   called the chrome bot

   <simonstewart> kk: runs on real and virtual machines

   <simonstewart> kk: cache of a large number of sites in a cache.
   Often used for crash testing. Also include tests that perform random
   ui actions

   <simonstewart> kk: a little bit smarter than pure random, but that's
   the gist

   <simonstewart> kk: qa level tests. Tests that are done by manual
   testers. Piggy back off the ui test automation framework. things
   ilke creating bookmarks, installing extensions, etc

   <simonstewart> kk: break down manual testing to test parts. First
   app compat. Push a new release of chrome it continues to work, and
   testing chrome at the ui level

   <simonstewart> Most of the ui is "based on the web"

   <simonstewart> For the chrome specific native widgets there are
   manual tests

   <simonstewart> kk: app compat depends on webdriver

   <simonstewart> kk: lots of google teams depend on webdriver to
   verify that sites work.

   <simonstewart> kk: guess that at a high level, the testing strategy
   tends to be developer focused.

   <simonstewart> kk: devs should write the tests in whatever tool and
   harness is most expedient for their purpose

   <simonstewart> kk: piggy back a lot on the fact that chrome does
   rapid releases. 4 channels release to users (canary, dev, beta,
   stable)

   <simonstewart> kk: different release schedules

   <simonstewart> kk: depend a lot on user feedback from the canaries

   <simonstewart> kk: that's the gist of it

   <simonstewart> tab: sounds good to me

   <simonstewart> jhammel: do chrome do performance testing?

   <simonstewart> kk: we do. Using the AP and the ui testing framework
   mentioned earlier

   <simonstewart> http://build.chrome.org

      http://build.chrome.org/

   <simonstewart> to view the tests that have been run

   <simonstewart> plh: do we run jquery tests

   <jhammel> ^ correction: http://build.chromium.org

      http://build.chromium.org/

   <simonstewart> kk: not really. webkit guys might, and we pick that
   up

   <simonstewart> krisk_: do you create tests and feed them back

   <simonstewart> TabAtkins: we don't do much, but we do

   <simonstewart> krisk_: is that because it doesn't fit with the
   systems

   <simonstewart> TabAtkins: the ways we write and run tests isn't
   really compatible with the existing w3 systems.

   <simonstewart> TabAtkins: would like to change that!

   <simonstewart> TabAtkins: some tests are html/js. which might be
   used where possible. Doesn't ahppen that regularly

   <simonstewart> krisk_: how do you know that you're interoperable?

   <simonstewart> TabAtkins: in terms of webkit stuff, it's a case of
   testing being done by different browser vendors

   <simonstewart> kk: lots of c++ tests that are specific to chrome

   <jhammel> simonstewart: np :)

   <simonstewart> krisk_: v8?

   <simonstewart> TabAtkins + kk: v8 team live in europe. Who knows?

   <simonstewart> wilhelm: also has legacy stuff for opera. New tests
   written in a way that (in theory) is usable outside. Can chrome do
   the same thing?

   <simonstewart> TabAtkins: will agitate for that. Involved in spec
   writing rather than active dev, so might be tricky

   <simonstewart> wilhelm: This is a great forum to raise those issues.
   Opera happy to share with Chrome if Chrome does the same :)

   <simonstewart> krisk_: do chrome try and pass a bunch of the w3c
   test suites?

   <simonstewart> TabAtkins: yes. Some of the might be integrated into
   the chromium waterfall. Some of them might be run by hand

   <simonstewart> ?? does anyone know about webkit testing

   <simonstewart> TabAtkins: the people who'd I'd like to ask aren't
   around

   <simonstewart> webkit does seem to take in test suites from mozilla.
   They're running against a bitmap that's different from the moz
   rendering

   <simonstewart> TabAtkins: we don't have a good infrastrcuture for
   ref tests

   <simonstewart> TabAtkins: the test infrastructure people _do_ want
   to fix that

   <simonstewart> TabAtkins: every time a new port is added to webkit,
   there are more pixel tests. Provides pressure to do better

   <simonstewart> plh: any other questions?

   <simonstewart> 15 minute break coming up

   Info available from webkit: https://trac.webkit.org/wiki

      https://trac.webkit.org/wiki

   also see http://www.webkit.org/quality/testing.html

      http://www.webkit.org/quality/testing.html

   <krisk_> Next agenda Item jgraham talking about testharness.js

   <MichaelC_SJC> scribe: testharness.js

   <MichaelC_SJC> scribe: krisk_

krisk_

testharness.js

   <MichaelC_SJC> s/topic: krisk_//

   <fantasai> scribenick: fantasai

   jgraham: testharness.js is something I wrote to run tests.
   ... It runs JS tests specifically
   ... It's a bit like MochiTest or QUnit which JQuery uses, or various
   things

   <plh> --> http://w3c-test.org/resources/testharness.js
   testharness.js

      http://w3c-test.org/resources/testharness.js

   jgraham: Every JS framework has invented its own testharness
   ... This has slightly different design goals
   ... The overarching goal is that it's something we can use to test
   low-level specs like HTML and DOM
   ... So it can't rely on lots of HTML and DOM :)
   ... The design goals were to provide some API for writing readable
   and consistent tests

   in JS

   jgraham: Our previous harness at Opera, as I mentioned, didn't resul
   in very readable

   tests

   jgraham: The other is to support testing the entire DOM level of
   behavior
   ... There are 2 test types : asynchronous tests and synchronous
   tests
   ... second us purely syntactic sugar
   ... Another design goal was to allow possibility of the test to have
   multiple assertions, and all have to be true for test to pass
   ... typical example might be checking that some node has a set of
   children.
   ... Might want to first test for any children before testing that
   4th child is a <p>
   ... Multiple tests per file was a requirement; learning from Opera's
   1/file, which was painful for test writers and discouraged many
   tests
   ... ... runs everything in try-catch blocks
   ... One feature of that is that every bit of the test is like a
   function, basically
   ... it tries to handle some housekeeping.
   ... if you have 1000 tests in a file, nice if you can time out those
   tests individually
   ... Uses settimeout(); can override that if you want, e.g. if
   running on slow hardware
   ... and a design goal was easy integration with browsers' existing
   test systems
   ... Should be easy to use on top of MochiKit or whatever you use for
   reporting results
   ... next thin I thought I'd do is go through creating a test.

   jgraham's text editor:

   <script src="resources/testharnessreport.js"></script.

   <script src="resources/testharness.js"><script>

   <div id="log"></div>

   jgraham: By default testharnessreport.js is blank. It's for you to
   integrate into your testing system.
   ... the order is not at the moment relevant
   ... we might later check in testharness.js that testharnessreport.js
   was included

   added to file:

   (at the top)

   <title> Dispatching custom events</title>

   (at the bottom)

   <script>

   var t = async_test("Custom event dispatch");

   </script>

   jgraham: Each test has a number of tests, and each step is a
   function that gets called
   ... It gets called inside a try-catch block, and we can check if the
   test failed. We don't put anything as top-level code.

   (added at the bottom)

   t.ste(function() {

   (ok, that's too much to type)

   jgraham: Here it's adding an event listner before the second step
   ... When it gets called, it'll cal lthis other function here, which
   will run this other step, which is another function. Can get a bit
   verbose.
   ... There's a convenience method that will make this easier.. all
   documented in testharness.js
   ... Simple assert_equals() with value we get, value we expect, and
   then you can optionally have a string that describes what it is
   you're asserting.
   ... At this point everything we want done is done, so we say
   t.done();
   ... If you load this in a browser, because we have div#log, it will
   show whether it passes or fails and what assert failed

   <plh> -->
   http://w3c-test.org/webapps/ElementTraversal/tests/submissions/W
   3C/Element-childElementCount.html Example of testharness.js

      http://w3c-test.org/webapps/ElementTraversal/tests/submissions/W3C/Element-childElementCount.html

   jgraham: That's all

   jj: Is there an id on the steps, so that you can say you failed step
   4 of test foo?

   jgraham: If there's demand, there could be a second argument there.

   jj: would be nice to know where it failed so I can set a breakpoint
   there

   jgraham: If you get a huge number of tests per file, it's usually
   auto-generated
   ... if it's failing in an assert, then it'll tell you which assert
   failed

   plh shows his example

   plh: everything shown here is generated by testharness.js

   jgraham: There's a failure in this, and it seems everyone fails
   that.

   plh: Bug in testharness.js

   jj: Easiest way to debug the test. Is there an error in the test,
   error in testharness.js, or error in browsers

   jgraham: There are various types of assertions. Usually corresponds
   to webIDL
   ... But what's in webIDL isn't always the same

   kk: It's pretty well-written, only 700 lines or so

   clint: If it's synchronous, you don't have to do t.step()

   jgraham: A test that is synchronous implicitly creates a step

   wilhelm: Opera currently uses this tool for all the new tests that
   we write. Can others use this?

   clint: Yeah, I think so

   kk: There use to be some nunit or something that W3C had
   ... Was in IE, but some browsers couldn't run it.
   ... Very complicated

   [server problems]

   plinss: Are tests grouped by section into files?

   jgraham: In this case, it checks reflection section, plus section of
   each part of the spec that defines a reflected attribute

   topic change

   wilhelm: plh wanted to talk about test harness, fantasai wanted to
   talk about syncing problem

How should we organize public test suites so that they are as easy as
possible to contribute to and reuse?

   htp: //w3c-test.org/framework/

   MikeSmith: This is an instance of the framework peter demoed

   Mike: I'm going to show you what has been added here to make it
   easier for test suite maintainers to add data to the system.
   ... There's this area called Maintianer Login
   ... It'll give you an http_auth, which authenticates against W3C's
   user database
   ... Email me if you want access to the system
   ... Once you go in there you'll see 2 options: add metadata, change
   metadata
   ... Can add a specification
   ... one early piece of feedback I got was they have tests they want
   to run that are not associated with a spec.
   ... So in this instance of the system, it's not a requirement to
   have a spec for your test suite
   ... You can give it an arbitrary ID as long as not a duplicat
   ... Title of the spec
   ... URL for the spec
   ... It expects you'll point it to a single-page version of the spec
   ... If you have a multi-page spec, don't point it at the TOC. You
   need the full version of the spec.
   ... Could change later, but initially set up this way 'cuz easier
   ... This will get added to the list here
   ... Next thing you can do is needed if you want to do what Peter was
   demoing earlier, which was associating testcases with specific
   sections of the spec -- or specific IDs in the spec
   ... Structured around idea that you put your IDs per section
   ... But some WGs like WOFF WG they're putting assertions at the
   sentence level
   ... They don't actually have section titles, so needed to
   accommodate that too

   Peter: Alan and fantasai did some work on that, too.
   ... Shepherd tool will be able to parse out spec to find test
   anchors
   ... and then can report testing coverage of the spec, so this is
   something we will automate

   Alan: What fantasai and I worked out was based on WOFF work, but
   will be simpler for spec editors. A bit harder to automate, though

   Mike: This part add spec metadata.
   ... Instead of a form to fill out, it lists existing specs in the
   system
   ... once you go here, if there's already data in the system, will
   show you data in the system alread
   ... otherwise it'll show you generated data
   ... This parses the spec and pulls out the headings. If it looks ok,
   you press submit
   ... It'll put these section titles into the database.
   ... If you have IDs below the section title level, then you'll have
   to use a different way to get it into the DB
   ... You might have to get me to do it for now :)
   ... Those steps are optional right now.
   ... What is necessary is going in and giving info about the test
   suite itself.
   ... you can give it an arbitrary ID
   ... Title, longer description
   ... to explain better thet est suite
   ... base URL of where your test suites are stored
   ... Difference from CSS is, that one requires format subdirectories

   plinss: it's optional

   Mike: This one doesn't expect subdirectories. Expects all tests in
   this one directory
   ... If you have separate subdirectories...
   ... Need to make different test suites or ...
   ... Simplest case you have all tests in one directory

   plinss: The code's actually a lot more flexible wrt formats. We'll
   talk offline.

   MikeSmith: Then you have contact information for someone who can
   answer questions about test suites
   ... Then you indicate format of the test suite
   ... Then you have a list of flags, you can select which ones
   indicate optional tests
   ... There are ways to add flags to the system
   ... No ui for it, so contact me
   ... Last thing you then do is upload a manifest file
   ... You have to have a test suite
   ... You select a test suite
   ... and then what I have it do right now is that you need to point
   it to the url for a manifest file, and it'll grab that and read it
   in
   ... Right now two forms of manifest files that it will recognize
   ... second one here is just a TSV that expects path/filename,
   references, flags, links, assertions
   ... links are the spec links
   ... The other big change is, I was talking with some people e.g.
   annevk and ms2ger
   ... the format they're using is just listing the filenames
   ... it marks support files as support files

   kk: Mozilla guys wanted to know what files were needed to pull to
   run a test case

   plinss: In the CSSWG, the large manifest file with metadata -- that
   gets built by the build system

   MikeSmith: This form expects the full filename, not just the
   extensionless filename
   ... Because that's what they had
   ... Once you have that, you should be able to get your test cases
   into the test database
   ... and it'll show up on the welcome page
   ... Long way to go on this.
   ... Goal when I started on this was to get it to the point where I
   didn't have to manually do INSERT in SQL to get specs into the
   database
   ... What would be really nice is if ppl start using this and getting
   more test suites in there so that we can ..

   plinss: But right now only limited set of ppl can contribute to that
   code

   MikeSmith: I created two groups in our database
   ... I created a group for developers -- anyone who wants to
   contribute to framework
   ... That'll give you write access to hg repo for the source code for
   this
   ... Take a look at source code and see problems, send me patches or
   I'll give you direct access
   ... Second thing is if you want to have access to use this UI to
   submit test suite data, I'll have to add you to a particular group

   fantasai: how is this code related to plinss's code?

   MikeSmith: It's forked from that.
   ... I've just been pulling the upstream changes
   ... been able to merge everything without it breaking.
   ... Think it's in good enough shape that we could port it back
   upstream

   plinss: This system and the Shepherd share a lot fo the same base
   code
   ... Lots of things I was going to port Shepherd system back into
   this system, and then pull your stuff in too
   ... Mike also has code that ties into the testharness.js code, and
   will automatically submit results from that

   MikeSmith: If you go to enter data, it gives you some choices about
   whether you want to run full test suite or not
   ... There's a button here that will pull automatic results where
   possible
   ... Be careful, this will submit the data publicly!

   jgraham: Not saying it's a bad idea, but from our POV, we're not
   going to use it offline.

   (Brian was talking about trying out the system privately offline)

   plinss: The system tracks who's submitting the data. By login if
   you're logged in, by IP if not

   Brian: Privacy is useful

   plinss: goal is for pulling data from as may sources as possible

   wilhelm: fantasai wanted to talk about keeping things in sync

   <dobrien> Is someone scribing? I can't keep up on the iPad

   <ctalbert_> This is the writeup that we are planning to set up at
   Mozilla for the CSS tests specifically:
   https://wiki.mozilla.org/Auto-tools/Projects/W3C_CSS_Test_Mirror
   ing

      https://wiki.mozilla.org/Auto-tools/Projects/W3C_CSS_Test_Mirroring

   <krisk_> Mozilla has a way to move tests from mozilla -> w3c ->
   mozilla

   <ctalbert_> wilhelm: how will this cope with local patches?

   <krisk_> fantasi: The master copy only lives in one place...

   <ctalbert_> jgraham: probably not a problem with the css tests

   <krisk_> fantasi: approved is the master in w3c

   <krisk_> fantasi: submitted is the master for submissions

   <ctalbert_> jgraham: opera is thinking of having the master from w3c
   which is intact, and our checkout from that master will have the
   local patches, and when we pull we'll rebase our patches atop the
   w3c master

   <ctalbert_> this should be possible now that hg is in the w3c side
   and our (opera) side

   <ctalbert_> fantasai: we'll probably have to do something similar

   <krisk_> wilhelm: how does this handle local patches?

   <ctalbert_> jhammel: is there a technical limitation to not have
   people editing the w3c tests

   <ctalbert_> fantasai: no

   <krisk_> fantasi: this is only for css which don't seem to have this
   problem

   <ctalbert_> jgraham: probably make it a commit hook

   <ctalbert_> ctalbert_: agreed

   <ctalbert_> peter: if someone pushes to the approved directory
   without actually being approved then the system just automatically
   denies them

   <ctalbert_> that may be incorrect ^ (scribe error)

   <ctalbert_> wilhelm: might be an idea to split test suites down at
   lower granularity levels so that you can have test suites with
   differnt levels of maturity

   <ctalbert_> jgraham: don't think that would make a difference tbh

   <ctalbert_> peter: our repo would keep all the data from all the
   suites in the repo so that our build system could build any version
   of them from any suite

   <ctalbert_> wilhelm: are there other things we can do to make it
   easier to contribute test suites?

   <ctalbert_> fantasai: one problem on the mozilla side - there's no
   place to put tests that should go to the w3c - we depend on a manual
   process to sort out which should be submitted and then it is done
   later

   <ctalbert_> fantasai: these tests just sit in a random place and are
   forgotten

   <ctalbert_> fantasai: once we have a directory that goes to w3c and
   we tell the reviewers, then it will help quite a bit.

   <ctalbert_> fantasai: the basic idea is to make the process obvious
   what developers need to do with that test to indicate that it is
   appropriate and ready for w3c then it should "just happen"

   <ctalbert_> jgraham: we have a similar problem. it's hard to surface
   those tests and bugfixes without a policy and a place for those
   tests

   <ctalbert_> peter: if we have a standard format among the test
   writers then it will be easier to help developers to upload the
   tests to the w3c. If the developers have to convert the tests it's
   too difficult and people won't expend the effort to make it happen

   <ctalbert_> krisk_: sometimes it depends on the editors as to when
   they allow tests into the spec, and you find that tests sometimes
   lag the spec by quite a bit

   <ctalbert_> fantasai: we found that with the css - the person
   writing the spec is often nominally tasked with also writing the
   test suite but because the skill sets are different and the spec
   editor is usually swamped, then the tests get neglected

   <ctalbert_> fantasai: we really need a dedicated person to manage
   these tests and testing effort for each spec

   <ctalbert_> MikeSmith: is there some way to motivate people to do
   that?

   <ctalbert_> MikeSmith: maybe we should publicly track the testsuite
   owner?

   <ctalbert_> fantasai: we can do that, but the burden is on getting
   resources for that, really.

   <ctalbert_> MikeSmith: yeah, the question is how do you encourage
   the managers allow their people to spend times on w3c work

   <ctalbert_> MichaelC_SJC: you might be able to convince your company
   to do that, but we also need to have the working group chairs
   understand that this needs to happen

   <ctalbert_> jgraham: if we have them already in an interoperable
   format then it's pretty easy, but for our existing tests that are in
   a different format, we aren't going to spend the time to convert
   them

   <ctalbert_> fantasai: we might just have a place at w3c to take
   those tests, and just post them publicly and have someone else do
   the conversion work

   <ctalbert_> jgraham: I suspect that's a wide problem

   <ctalbert_> krisk_: if you getin the habit of submitting stuff as
   you're doing development, tat seems reasonable.

   <ctalbert_> krisk_: keeping things not super complex is a win, and
   being consistent will pay dividends

   fantasai^: Because for Opera it may not be valuable to do the
   conversion, but e.g. Microsoft might want those tests, and decide
   that the cost of converting is less than the cost of rewriting tests
   from scratch, so to them it'll be worth it to do the conversion

   <ctalbert_> fantasai: thanks, I'm not too good at this :/

   <ctalbert_> (scribe note ^)

   <ctalbert_> wilhelm: the more I think of this, the more I realize
   that facilitating the handover of tests is a full time job

   <Zakim> MichaelC_SJC, you wanted to ask how much should there be a
   "W3C format" vs how much does W3C framework need to format (nearly)
   any format?

   <ctalbert_> wilhelm: if we could get every browser vendor to commit
   one person to do this work on their team then that would be good.

   <ctalbert_> fantasai: the problem we're at now, people havne't
   adopted the w3c ofrmats internally

   <ctalbert_> it will be less work once that happens

   <ctalbert_> it's not w3c's responsibility to convert your tests to
   w3c

   <ctalbert_> fantasai: you can write a conversion script to convert
   your test to w3c format

   <ctalbert_> better to do that than to have w3c to accept all
   differnt formats

   <ctalbert_> jgraham: the problem is that many of these harnesses are
   not built for portability

   <ctalbert_> MichaelC_SJC: the problem with a common format (and I
   may be wrong) is that you run into things you can't test

   <ctalbert_> jgraham: if we run into that, then in that case maybe we
   can find some lightweight format for those tests, or in that case
   maybe we use a different type of harness

   <ctalbert_> scribe: ctalbert has to step out

   <ctalbert_> fantasai: ^

   scribe:

   kk: If you can write it with testharness.js, do that. If not, try
   reftest, if not, try self-describing test
   ... In your case you have the difficulty of needing a screenreader
   or something
   ...

   jgraham: If you can get ppl to contribute in one format, at least
   you solve the problem once per platform rather than once per test

   mc: I think there's a hierarchy of goodness
   ... The framework should have at least thepossibility of hooking in
   new formats

   general agreement

   wilhelm: For the Watir cases, we noticed areas where we'd want to
   addtests for something very obscure and specific. What we've done is
   add support at a low level in Opera and use an API
   ... Such things could be later added to WebDriver

   <MichaelC_SJC> s/I think there's/I can agree with the idea that/

   Alan: For tests where there isn't a w3c version, but browsers have
   something, is there a list of most-wanted specs that need tests on
   the w3c site

   fantasai: All of them? :)

   Alan: We were talking about poking ppl, committing ppl to
   translating browser tests to w3c tests
   ... Would be more successful to getting resources if we have a
   specific list of things we need

   jj: Also possibility to ask specific people.
   ... Rather than saying, please call all submit tests for HTml5
   ... Say, can you submit tests for WebWorkers
   ... need a specific ask to get things done
   ... It might not cause immediate surge in test submissions, but for
   me from outside to inside, the idea of submitting tests was
   impossible to me. Didn't know where to submit them, figured they'd
   be rejected, didn't know what a reftest was, etc.
   ... So process was hard, and not being specific
   ... Better way to get things done is asking
   ... Would like Opera to submit WebWorker tests

   wilhelm: Can I get that in writing so I can show it to my manager?

   Alan: Identify the tests, see who has those tests, then request them

   plh: We've been corrsponding on testing framework a little bit, but
   part of task is also going out there in the wild and finding tests
   and getting them to W3C
   ... Need to get to point where we have framework and start on asking
   tests

   Alan: Use framework to identify areas, since it annotates the spec

   jj: We have no idea how much coverage those 47 tests have -- number
   isn't meaningful from a coverage perspctive
   ... 1 is better than 0, but maybe 100 is needed not47

   ?: Test coverage is a negative covered only know when something is
   not covered, not how well something is covered

   jj: Even if you say you have 100% on that normative statement, still
   doesn't tell you if you got all the edge cases

   jgraham: At the moment for HTML we have nothing, though.

   <simonstewart> ^^ simonstewart: test coverage is a negative thing.
   It'll only say what's not covered, not how well the covered areas
   are tested

   jgraham: We have our tests organized by section in the repo, but
   it's not explicit
   ... Being able to say per normative statement, do we have a test for
   this, is pretty nice

   <plh> --> http://www.w3.org/2011/10/timer.html (annoying) timer

      http://www.w3.org/2011/10/timer.html

   jgraham: If you look somewhere, there's an annotation per sentence
   in the spec showing tests for section X
   ... But that's really complicated, because spec isn't marked up to
   make that easy
   ... and testing dozens of disconnected statements

   kk: The problem we're struggling with is not that how do we get
   perfect coverage. There's a spec, and there's no coverage.
   ... Browsers all have this feature, and they don't work the same. So
   having some is a good start.

   Bryan: If you look at most of WebAPIs near LC or at LC, only 1/3
   have tests available

   <jhammel> fantasai: setup a process for getting tests from *your*
   organization to w3c, and *going forward*, you should write
   w3c-submittable tests *and* submit the tests. Once that is in place,
   we can go back and convert legacy tests

   <plh> s/corrsponding/working/

   <jhammel> fantasai: we need to get the webkit people to commit to
   this

   <jhammel> fantasai: you can require that when checked into repo,
   they become reftests

   <jhammel> fantasai: plan going forward is to convert to reftest

   <jhammel> jgraham: if you're comparing to something bitmap-based, it
   may take 2x time, but it will save time going forward

   fantasai^: Because then the number of legacy tests that are not
   w3c-formatted stops growing, and we can work on making that number
   smaller

Additional Items

   example of a test that has to be self-describing: This tests that
   the blurring algorithm produces results within 5% of a Gaussian blur

   http://test.csswg.org/source/contributors/mozilla/submitted/css3
   -background/box-shadow/box-shadow-blur-definition-001.xht

      http://test.csswg.org/source/contributors/mozilla/submitted/css3-background/box-shadow/box-shadow-blur-definition-001.xht

   bryan: We developed a number of specs for device APIs
   ... We recognize these APIs are quite sophisticated, an it'll take
   some time, but we're continuing the development of these
   capabilities for web runtimes
   ... We have developer program, global ... ecosystem

   bryan (from AT&T): wanted very briefly ...

   bryan: show you these links to the specs, the APIs, but more
   importantly the test framework
   ... Test framework is based on QUnit
   ... Pulls in a file from a test directory, which has the list of
   test associated with this particular API.
   ... Tests individual JS filesin the same directory
   ... will run them one by one
   ... This is packaged up as a widget file, whcih is available for
   download
   ... So we can run all the tests for example using this widget
   framework.

   bryan shows pie charts of resutls

   bryan: Automatically uploaded and made available to vendor

   plh: Say 1000 tests for core web standards?

   bryan: No for APIs
   ... What comes for underlying platform is inherently tested by that
   community
   ... We need to cover device variation
   ... identify things that we reference
   ... We have individual tests for these, test scripts
   ... this is more than acid level test, but not what we hope to see
   from W3C in long run
   ... We don't want to develop and maintain this level of detail in
   WAC. Want to leverage W3C test suites
   ... If you look at the tests, you can see for example the
   geolocation test suite, which we reference.
   ... We want to auto-generate the tests as widget

   jj: So if hte test suite changes, do you update your widget?

   bryan: Our goal is to create frameworks where we can pull in tests
   and run them in this runtime environment without havng to
   necessarily maintain the tests ourselves
   ... We would benefit from a common test framework
   ... What exactly these tests are is basically just a JS procedure
   ... We test existence of methods, call qunit functions for
   pass/fail, not necessarily married to this format, but it was the
   most common one at the time we developed this.
   ... So to summarize our goal is to have the scalability to support
   this widget-based ecosystem across dozens of devices across the
   world
   ... So we have to have scalability
   ... To depend on the core standards as something we don't spend a
   lot of effort on
   ... Duplicate things that eventually come from W3C.
   ... We'd like to see this developed at W3C so we can directly
   leverage it.

   fantasai comments on how this shows having a few common formats is
   better than having w3c accept many similarly-capable formats -- it
   better supports reuse of the tests

Conclusions and Action Items

   1. Vendors commit to running W3C tests

   2. Vendors push internally to adopt W3C test formats

   plh says W3C should make ti easier for vendors to import suites

   fantasai: what does that entail?

   plh: make guidelines for WG

   jgraham: I feel the problem is more on our side than on W3C side

   wilhelm, jgraham: but of course, using hg instead of cvs is
   important for tests

   wilhelm: W3C should commit resources to get tests from vendors

   plh: start with webapps

   wilhelm: Any conclusions on WebDriver discussion?
   ... We commit to work on the spec, and get that into our browser

   plh: MS and Apple should look into that

   Mike: normal people at apple are interested, but they're not the
   ones who sign off on things

   kk: Using testharness.js seems to me a very low-hanging fruit,
   rather than writing a whole bunch of APIs

   <jhammel> "not buy Apple" would be more effective

   wilhelm: There should be a spec that talks about it, for the IP
   stuff, we need to get a spec out so there's less risk for those
   implementing

   jgraham: There was some discussion, but no decision, about which
   bindings W3C would accept tests in

   wilhelm: I'd list that as an open issue

   MikeSmith: We want to follow up with testing IG , [other grou]

   s/grou/group/

   MikeSmith: Spec discussion would go to [... mailing list ...]

   wilhelm: Dumping ground for non-W3C-format tests

   kk: You can put whatever you want in submitted folder

   <MikeSmith> public-browser-tools-testing@w3.org

   jgraham: It would be nice if ppl dump random test suites in random
   formats, to separate those out from thing sthat would be approved in
   roughly their current form

   <MikeSmith>
   http://lists.w3.org/Archives/Public/public-browser-tools-testing
   /

      http://lists.w3.org/Archives/Public/public-browser-tools-testing/

   kk: We should have an old_stuff directory

   jgraham: And encourage people to dump stuff there

   <MikeSmith> for the Testing IG,
   http://lists.w3.org/Archives/Public/public-test-infra/ and
   public-test-infra@w3.org

      http://lists.w3.org/Archives/Public/public-test-infra/

   plh: We can associate a repo with the testing IG, and then anyone in
   that IG can push to the repo

   <plh> ACTION: Mike to create mercurial repositories for Web Testing
   IG and Browser Tools WG [recorded in
   http://www.w3.org/2011/10/28-testing-minutes.html#action01]

   fantasai: Should be clear that dumping things here is not the same
   as submitting to an official W3C test suite

   bryan: Should also have a wiki that documents what's there

   <ctalbert_> TabAtkins_: I accidentally locked myself on the patio,
   could you come rescue me?

   jj: Right, should be clear these are not submitted for review;
   they're there, and someone can take them and convert them and submit
   them

   <MikeSmith> http://www.w3.org/wiki/Testing

      http://www.w3.org/wiki/Testing

   jgraham: Come up with a prioritized list of things that need tests

   jj: anything that's in CR? :)

   plh: I'll take an action item to do that

   <scribe> ACTION: plh to make a list of things that need tests
   [recorded in
   http://www.w3.org/2011/10/28-testing-minutes.html#action02]

   bryan: Need a list of what's available, what are the key gaps, what
   do we need to get there

   kk: Identify specs that are in a bad situation.

   fantasai: Also want to track not just what needs testing, but ask
   vendors whether they have tests for any of these.
   ... Can then go pester people to submit those tests

   <scribe> ACTION: MikeSmith to Create repos for testing IG and
   testing framework group [recorded in
   http://www.w3.org/2011/10/28-testing-minutes.html#action03]

   plh: Need places to dump tests for groups that don't have repos atm
   ... more and more groups have their own test repo

   <plh> ACTION: plh to convince the geolocation WG to use mercurial
   for their tests [recorded in
   http://www.w3.org/2011/10/28-testing-minutes.html#action04]

   3. Vendors commit to finding a person to facilitate submission and
   use of W3C tests

   wilhelm: need to make a formal request to each organization

   bryan: Someone should pull together format descriptions and include
   the guidelines

   <plh> --> http://www.w3.org/html/wg/wiki/Testing/Authoring/
   Authoring Tests

      http://www.w3.org/html/wg/wiki/Testing/Authoring/

   dicussion of where to collect this information

   <plh> --> http://www.w3.org/testing/ Testing

      http://www.w3.org/testing/

   jgraham: should be in a place not specific to a given working group
   ...

   plinss: There's a lot to be gained by standardizing metadata

   jgraham: hard to do the CSS way for an HTML test
   ... Could have n ways to do it, where n is a small number

   Alan: It would be nice to have everything on a wiki so we don't have
   to go through a staff member
   ... What if this page was a redirect to a wiki?

   jgraham: Could have that page be a link to a wiki

   MikeSmith: I like redirect idea, minimizes work I have to do :)

   wilhelm: So when should we meet again?

   jj: I think we should definitely make this a regular meeting.
   ... Seems like everyone in every WG is going to be solving the same
   problems
   ...

   plh: WebDriver will be under browser tools WG

   mc: Who's "we"?

   wilhelm: I don't know, but this crowd is great.

   plh: We can put under the IG

   fantasai: We can say at last meet again next TPAC

   plh: Would be in France next year

   fantasai: Since not everyone will be travelling to TPAC, would we
   want to do another place at at different time as well?

   jj: Does everyone agree we should meet?

   kk: Depends on deliverables.

   MikeSmith: If we meet 6 months from now, when would that be?

   ?: April

   mc: Just want to be sure who the "we" is the invite would go out to

   wilhelm is designated in charge

   Meeting closed.

   RRSAgent: make minutes

Summary of Action Items

   [NEW] ACTION: Mike to create mercurial repositories for Web Testing
   IG and Browser Tools WG [recorded in
   http://www.w3.org/2011/10/28-testing-minutes.html#action01]
   [NEW] ACTION: MikeSmith to Create repos for testing IG and testing
   framework group [recorded in
   http://www.w3.org/2011/10/28-testing-minutes.html#action03]
   [NEW] ACTION: plh to convince the geolocation WG to use mercurial
   for their tests [recorded in
   http://www.w3.org/2011/10/28-testing-minutes.html#action04]
   [NEW] ACTION: plh to make a list of things that need tests [recorded
   in http://www.w3.org/2011/10/28-testing-minutes.html#action02]

   [End of minutes]
     _________________________________________________________

-- 
Michael[tm] Smith
http://people.w3.org/mike/+
Received on Thursday, 3 November 2011 00:02:30 UTC