F2F notes from Simon Stewart on 2016-07-02 (public-browser-tools-testing@w3.org from July to September 2016)

From: Simon Stewart <simon.m.stewart@gmail.com>
Date: Sat, 2 Jul 2016 10:25:07 +0100
To: "public-browser." <public-browser-tools-testing@w3.org>
Message-ID: <CAOrAhYHB3X=kQry_pB8HQf9uP3B3bt1ueawXkr+m3YiqA0ywAg@mail.gmail.com>
Hi everyone,

There's good news and bad news, and it depends on who you are, because it's
the same bit of news. I'm not going to be at the F2F session this time
round, though I am planning on attending the meeting in TPAC. Because I
won't be there, I thought it might be useful to put together a (sorry,
long) set of notes about the current agenda items.

Feel free to scroll down the notes in the sessions: it might make them more
bearable :) I'm currently working on shipping Selenium 3.0. Once we get the
beta out the door, my plan is to shift focus on to the spec, since that's
where I can have the most positive impact (and I hate multitasking)

Before I begin, there are two key things I bear in mind:

1) The spec's audience. These are "testers", "implementors", and "spec
authors". The key thing to note is that "testers" is the largest group by
far, and is a broad and diverse group, often with people with limited to no
control of their machines (that is, no admin access), and relatively
frequently with weak coding skills.

2) The design choices I laid out in the AOSA book
<http://www.aosabook.org/en/selenium.html>. Most important of these is that
the webdriver APIs were designed to emulate the user as closely as
possible. That's why the tool has continued to work as apps get more
sophisticated, and how we kept complexity out of the local-end APIs.

So, without further ado:

*Actions: Key Event keyboard layout issues*
The original design for this split keyboard input via the "do what I mean"
send-keys command into two paths. The simpler was for allowing simple
testing of i18n, and spammed the unicode characters into the event stream.
It would have been nice to do that by pretending that they were copied and
pasted, or provided via an IME, but I'm not that good a native coder in
Windows. The more complex but common path looked up key codes from the OS
and attempted to honour the current keyboard layout. That seemed like the
best idea, and it still does to me.

Of course, the JS implementations had no idea what the current keyboard
layout was, so they defaulted to 10-some-number US keyboard layout. I
suspect that choice is causing some confusion. I think we can now do a lot
better, and believe that mapping the inputs to the current keyboard is a
sane thing to do.

The only slight wrinkle will be people using a service such as SauceLabs or
BrowserStack across international boundaries --- the remote ends will
probably have a different keyboard layout than the local ends. I'm okay
with that, as my intuition is that in a vast number of cases it won't
matter at all.

*Actions: Pointer events model / implementation strategy*
I know that there's an issue around whether or not intermediate steps
should be interpolated when a issues a "move" command. My personal view is
to go back to the design principle of attempting to emulate the user as
closely as possible. If the pointer being used would normally generate
intermediate events (such as a mouse would), then "some" should be
generated. If the pointer being used supports "teleporting" from place to
place within the window (eg. a pen, when it's not being dragged) then
there's no requirement to generate intermediate events.

As an implementor, these two approaches boil down to a simple
implementation of the Strategy design pattern, which is selected by the
input device.

How to phrase that in unambiguous spec-ese is tricky, but we have some
folks who are great at writing specs in this group :) I'd leave the
language loose about how many intermediate events are to be generated, and
the path that is followed, to allow implementors freedom, but I'd suggest
that "mouse" pointer type MUST generate intermediate events in all cases,
and others ("pen" and "touch") MUST generate intermediate events when
dragging (that is, moves between a "down" and an "up" of the pointer), and
then leave the rest unspecified.

Obviously, this implies that the last known location within the OS window
or viewport is maintained by the driver, so that a "move" without setting
an initial start point works as expected.

*New Session: Proposal for non-capabilities top-level items*
Is unnecessary and probably browser specific. There was language in the
spec that says that intermediary nodes should not alter the contents of any
commands issued to them, including not removing fields. One of the
use-cases we discussed (from memory) was decorating commands and responses
with additional information. This proposal falls squarely into that use
case. Requiring additional processing away from the capabilities parameter
seems less than efficient too.

We should provide a mechanism to support number ranges, particularly for
versions, which addresses one concern I've seen. Hopefully some spec,
somewhere defines how to do that :) If not, I guess we can base what we do
on the nsIVersionComparator
<https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIVersionComparator>
from Mozilla (or equivalent Windows utility if there is one)

*New Session: Remote end webdriver-compatible capability.*
Is also unnecessary. The spec should assume that all remote and local ends
speak the w3c dialect of the protocol. Detection of capabilities should be
done by sniffing the capabilities returned by "new session", which means
that a naive implementation of this "webdriver compatibility" (done by
specifying the spec level) would lead people to the hell of determining
what's supported by just looking at that rather than doing things properly.
It's taken the JS community a long time to dig out of that hole and head to
feature sniffing. Let's not put ourselves in it.

As an implementation note, the handshake between remote and local ends
allows each end to determine unambiguously what the other end supports by
the end of the "new session" call:

OSS only: local end sends an object with "desiredCapabilities" and
'requiredCapabilities" fields. Remote end sends a response with a numerical
status

W3C only: local end sends an object with "capabilities" with nested
"desiredCapabilities" and "requiredCapabilities" fields. Remote end sends a
response with a string-based status field, using a different key name.

Bi-dialectual (if that's a word): local end sends a payload with
"capabilities", "desiredCapabilities", and "requiredCapabilities", remote
end selects which dialect to respond with.

Easy.

*Self-signed certs: accepting them implicitly being handled by
a capability?*
Yes! The default should be something that implies that self-signed certs
_are_ accepted (perhaps "strict ssl cert handling", which truthiness would
default to "false" if the key was missing), because…

*Self-signed certs: should we accept them implicitly?*
Yes. Going back to the audiences, lots of testers have UAT environments
with self-signed certs, and they just want their tests to work. They're
most likely to not have the knowledge or skill to realise that a capability
could or should be set. By accepting the certs implicitly, we reduce
support burden and boilerplate code, while allowing "power users" to do
something more secure.

This would also allow browser to refuse to handle self-signed certs by
returning the capability set as "true" when creating a new session.

*Navigation: malformed URLs?*
The remote ends all have deeply sophisticated and capable URL parsers. A
remote end should be responsible for indicating that a URL is poorly
formed. As such, the answer to whether the spec should allow the user to
navigate to malformed URLs is "it depends on the browser". I guess this
implies that there's a new error status of "invalid url" which the "go"[1]
command can return.

*Navigation: navigate to relative URLs?*
The above answer implicitly suggests the answer here is "we should allow
that", but my gut instinct is that a) it's not necessary (valid URL
manipulation can be done by the user in the programming language of their
choice outside of the local end) and b) it's not been asked for (webdriver
is 9+ years old, and we've never needed to implement this feature), and c)
it's dangerous, since testers may not know where the browser is initially
pointing when they get a relative URL.

*Window Handling:*
Not sure how to read that GH issue, but all top-level browsing contexts
should be addressable via a window handle, even if it's a tab. We used to
have a diagram in section 9.3 about this.

The alternative interpretation is "what happens when something tries to
open a new window?":
1) A "_blank" or named target in an href: of course this should be handled.
2) User emulates via the actions APIs "control clicking" on a link, causing
the browser to open a new window: of course this should also be handled.
3) User sets a break point and opens a new window manually, before
returning control back to the local end: should also be handled, with the
new window appearing in the set of window handles.

I understand that use case 3 might be hard to implement in some browsers,
but it really should be a supported use case. We can leave support for that
use case as undefined or "SHOULD" if one of the browser vendors states that
it's impossible to implement.

Thanks for listening, folks! See you all in Lisbon if not before!

Simon

[1] Historical note: the original command was named after the HTTP verb
"GET". The idea being that later webdriver implementations might have
wanted to implement "POST" or "DELETE" should they want to allow closer
control of http requests.
Received on Saturday, 2 July 2016 09:25:39 UTC