- From: Simon Stewart <simon.m.stewart@gmail.com>
- Date: Sat, 2 Jul 2016 10:25:07 +0100
- To: "public-browser." <public-browser-tools-testing@w3.org>
- Message-ID: <CAOrAhYHB3X=kQry_pB8HQf9uP3B3bt1ueawXkr+m3YiqA0ywAg@mail.gmail.com>
Hi everyone, There's good news and bad news, and it depends on who you are, because it's the same bit of news. I'm not going to be at the F2F session this time round, though I am planning on attending the meeting in TPAC. Because I won't be there, I thought it might be useful to put together a (sorry, long) set of notes about the current agenda items. Feel free to scroll down the notes in the sessions: it might make them more bearable :) I'm currently working on shipping Selenium 3.0. Once we get the beta out the door, my plan is to shift focus on to the spec, since that's where I can have the most positive impact (and I hate multitasking) Before I begin, there are two key things I bear in mind: 1) The spec's audience. These are "testers", "implementors", and "spec authors". The key thing to note is that "testers" is the largest group by far, and is a broad and diverse group, often with people with limited to no control of their machines (that is, no admin access), and relatively frequently with weak coding skills. 2) The design choices I laid out in the AOSA book <http://www.aosabook.org/en/selenium.html>. Most important of these is that the webdriver APIs were designed to emulate the user as closely as possible. That's why the tool has continued to work as apps get more sophisticated, and how we kept complexity out of the local-end APIs. So, without further ado: *Actions: Key Event keyboard layout issues* The original design for this split keyboard input via the "do what I mean" send-keys command into two paths. The simpler was for allowing simple testing of i18n, and spammed the unicode characters into the event stream. It would have been nice to do that by pretending that they were copied and pasted, or provided via an IME, but I'm not that good a native coder in Windows. The more complex but common path looked up key codes from the OS and attempted to honour the current keyboard layout. That seemed like the best idea, and it still does to me. Of course, the JS implementations had no idea what the current keyboard layout was, so they defaulted to 10-some-number US keyboard layout. I suspect that choice is causing some confusion. I think we can now do a lot better, and believe that mapping the inputs to the current keyboard is a sane thing to do. The only slight wrinkle will be people using a service such as SauceLabs or BrowserStack across international boundaries --- the remote ends will probably have a different keyboard layout than the local ends. I'm okay with that, as my intuition is that in a vast number of cases it won't matter at all. *Actions: Pointer events model / implementation strategy* I know that there's an issue around whether or not intermediate steps should be interpolated when a issues a "move" command. My personal view is to go back to the design principle of attempting to emulate the user as closely as possible. If the pointer being used would normally generate intermediate events (such as a mouse would), then "some" should be generated. If the pointer being used supports "teleporting" from place to place within the window (eg. a pen, when it's not being dragged) then there's no requirement to generate intermediate events. As an implementor, these two approaches boil down to a simple implementation of the Strategy design pattern, which is selected by the input device. How to phrase that in unambiguous spec-ese is tricky, but we have some folks who are great at writing specs in this group :) I'd leave the language loose about how many intermediate events are to be generated, and the path that is followed, to allow implementors freedom, but I'd suggest that "mouse" pointer type MUST generate intermediate events in all cases, and others ("pen" and "touch") MUST generate intermediate events when dragging (that is, moves between a "down" and an "up" of the pointer), and then leave the rest unspecified. Obviously, this implies that the last known location within the OS window or viewport is maintained by the driver, so that a "move" without setting an initial start point works as expected. *New Session: Proposal for non-capabilities top-level items* Is unnecessary and probably browser specific. There was language in the spec that says that intermediary nodes should not alter the contents of any commands issued to them, including not removing fields. One of the use-cases we discussed (from memory) was decorating commands and responses with additional information. This proposal falls squarely into that use case. Requiring additional processing away from the capabilities parameter seems less than efficient too. We should provide a mechanism to support number ranges, particularly for versions, which addresses one concern I've seen. Hopefully some spec, somewhere defines how to do that :) If not, I guess we can base what we do on the nsIVersionComparator <https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIVersionComparator> from Mozilla (or equivalent Windows utility if there is one) *New Session: Remote end webdriver-compatible capability.* Is also unnecessary. The spec should assume that all remote and local ends speak the w3c dialect of the protocol. Detection of capabilities should be done by sniffing the capabilities returned by "new session", which means that a naive implementation of this "webdriver compatibility" (done by specifying the spec level) would lead people to the hell of determining what's supported by just looking at that rather than doing things properly. It's taken the JS community a long time to dig out of that hole and head to feature sniffing. Let's not put ourselves in it. As an implementation note, the handshake between remote and local ends allows each end to determine unambiguously what the other end supports by the end of the "new session" call: OSS only: local end sends an object with "desiredCapabilities" and 'requiredCapabilities" fields. Remote end sends a response with a numerical status W3C only: local end sends an object with "capabilities" with nested "desiredCapabilities" and "requiredCapabilities" fields. Remote end sends a response with a string-based status field, using a different key name. Bi-dialectual (if that's a word): local end sends a payload with "capabilities", "desiredCapabilities", and "requiredCapabilities", remote end selects which dialect to respond with. Easy. *Self-signed certs: accepting them implicitly being handled by a capability?* Yes! The default should be something that implies that self-signed certs _are_ accepted (perhaps "strict ssl cert handling", which truthiness would default to "false" if the key was missing), becauseā¦ *Self-signed certs: should we accept them implicitly?* Yes. Going back to the audiences, lots of testers have UAT environments with self-signed certs, and they just want their tests to work. They're most likely to not have the knowledge or skill to realise that a capability could or should be set. By accepting the certs implicitly, we reduce support burden and boilerplate code, while allowing "power users" to do something more secure. This would also allow browser to refuse to handle self-signed certs by returning the capability set as "true" when creating a new session. *Navigation: malformed URLs?* The remote ends all have deeply sophisticated and capable URL parsers. A remote end should be responsible for indicating that a URL is poorly formed. As such, the answer to whether the spec should allow the user to navigate to malformed URLs is "it depends on the browser". I guess this implies that there's a new error status of "invalid url" which the "go"[1] command can return. *Navigation: navigate to relative URLs?* The above answer implicitly suggests the answer here is "we should allow that", but my gut instinct is that a) it's not necessary (valid URL manipulation can be done by the user in the programming language of their choice outside of the local end) and b) it's not been asked for (webdriver is 9+ years old, and we've never needed to implement this feature), and c) it's dangerous, since testers may not know where the browser is initially pointing when they get a relative URL. *Window Handling:* Not sure how to read that GH issue, but all top-level browsing contexts should be addressable via a window handle, even if it's a tab. We used to have a diagram in section 9.3 about this. The alternative interpretation is "what happens when something tries to open a new window?": 1) A "_blank" or named target in an href: of course this should be handled. 2) User emulates via the actions APIs "control clicking" on a link, causing the browser to open a new window: of course this should also be handled. 3) User sets a break point and opens a new window manually, before returning control back to the local end: should also be handled, with the new window appearing in the set of window handles. I understand that use case 3 might be hard to implement in some browsers, but it really should be a supported use case. We can leave support for that use case as undefined or "SHOULD" if one of the browser vendors states that it's impossible to implement. Thanks for listening, folks! See you all in Lisbon if not before! Simon [1] Historical note: the original command was named after the HTTP verb "GET". The idea being that later webdriver implementations might have wanted to implement "POST" or "DELETE" should they want to allow closer control of http requests.
Received on Saturday, 2 July 2016 09:25:39 UTC