Re: 3 interesting bugs from Brian Burg on 2016-09-27 (public-browser-tools-testing@w3.org from July to September 2016)

From: Brian Burg <bburg@apple.com>
Date: Tue, 27 Sep 2016 12:41:37 -0700
To: Andreas Tolfsen <ato@mozilla.com>
Cc: "public-browser-tools-testing@w3.org" <public-browser-tools-testing@w3.org>, Clay Martin <clmartin@microsoft.com>
Message-id: <EC3E547A-F5FB-4D16-8647-CF55D5367AED@apple.com>
Let me pile on here with some comments.
 
> On Sep 27, 2016, at 6:11 AM, Andreas Tolfsen <ato@mozilla.com> wrote:
> 
> Hei Clay,
> 
> Clay Martin <clmartin@microsoft.com> writes:
> 
>> Bug #1:
>> https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/8073490/
>> This has to do with opening external applications. In Edge this
>> causes webdriver to hang (we plan to address this) but I realized the
>> specification doesn't really handle what happens if the tester does
>> something that would otherwise pull the user out of the context of
>> the browser. Not a big deal at the moment for us as we plan to just
>> return in this case.
> 
> I think Simon covered this, but with regards to the specific bug you
> linked, there is actually no prose in the spec to cover what should
> happen when interacting with <input type=file> elements currently.  I
> intend to fix that this week.
> 
> The brief summary is that sending keys to <input type=file> should
> trigger implementation-specific steps to ensure the first element of
> the `files` property is replaced with a new File object.  If sending
> keys to <input type=file multiple>, another File object should be
> appended to the `files` property.
> 
> Give me a few days to write this down in specalese.

Currently, safaridriver has no support for <input type=file>. I looked into how we might do it, and am worried about it being a vector for session hijacking or exfiltration of user data stored at well-known filesystem locations. Have any other implementors added restrictions to mitigate this? One thing I was thinking about is, at session creation time, having a capability key to set the root directory from which files can be uploaded. File opening would of course be subject to any OS-level file modes or ACLs. This still requires some care on the part of the user, but should make this a much less tempting target for malicious use. Whether or not such a mechanism would be mentioned in the specification is something I can't answer, as such a policy seems inherently implementation-defined. What do you think?

I also really dislike shoehorning this into sendKeys for obvious reasons, but I think that ship has sailed for v1.

> 
>> Bud #2:
>> https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/8515651/
>> This covers an issue I brought up at the July F2F around the various
>> oddball inputs. We have issues "sending keys" to these as they are
>> somewhat special, so thoughts on this would be interesting. I'm
>> guessing value should just be set and any relevant events fired but
>> wanted to know thoughts.
> 
> We covered this at our F2F in Boston.  Instead of sending key strokes
> as we do for <input type=text> and <textarea>, we will treat all HTML
> (5) input elements such as <input type=color> et al. by setting their
> `value` attributes directly to the supplied string.
> 
> This means sending "#FFB6C1" to <input type=color> will cause its
> internal colour state to change to pink.  Or to put it more simply, we
> delegate to what the HTML spec tells the browser to do when setting
> the property.
> 
> Again, this is also uncovered in the current specification draft.  As
> above I will fix that this week.

This seems like a reasonable way to change the values. It's unclear what would happen if the automation command were to click on, say, a color picker widget. Should it just be suppressed from presenting, since it might be an OS-level dialog?

> 
>> Bug #3:
>> https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/8074852/
>> This final one tackles which tab is active vs which webdriver tab is
>> active. For Edge after doing a window.open() the active tab is the
>> new tab while the webdriver tab hasn't changed (no switch to window
>> call). If you execute a get command, the active tab will switch back
>> to the opener which is also the webdriver active tab and then do a
>> navigate. In Chrome it allows the get to happen in the webdriver tab
>> without it being in focus. As this could change what a site does
>> (being displayed vs not displayed) it seems like an interesting case
>> to consider.
> 
> WebDriver has its own idea of what the the current top-level browsing
> context is, and this does not necessarily correspond to which is
> the currently focussed or active in the browser UI.
> 
> This relates back to what I said in my other reply, that in a WebDriver
> context it does not matter whether the browser has focus or not.
> 
> Because of the stateful nature of the WebDriver protocol I think users
> would find it rather confusing if their idea of which top-level
> browsing context they were interacting with suddenly changed without
> notice.  I can certainly imagine scenarios where a popup would appear
> that would cause automation steps to be incredibly tedious to write.
> 
> Because the WebDriver protocol isn’t a duplex protocol and does not
> have locks, there is no way we could guarantee race condition-free
> behaviour if the current top-level browsing context was linked to
> whichever one is in focus.
> 

safaridriver keeps track of the "WebDriver" focused browsing context separately from the UI state. However, it does cause Safari to take focus when simulating user inputs. This is a restriction of the windowing system and the fact that safaridriver sends simulated OS-level events to the browser application, which do nothing if the correct application window isn't focused. This focusing is done automatically so users don't need to worry about it.
Received on Tuesday, 27 September 2016 19:42:08 UTC