Re: Units of measurement and scrolling in actions

On 12/01/17 10:41, David Burns wrote:
> tl;dr; The only size we can use is CSS pixels as that is what browsers
> know. More answers inline.
>
> David
>
> On 12 January 2017 at 09:26, Simon Stewart <simon.m.stewart@gmail.com>
> wrote:
>
>> Hi,
>>
>> TL;DR: should units for size and distance be consistent throughout the
>> spec? And if a scroll is needed when using Actions, how should that be
>> specified, or is it implicit?
>>
>> Long version:
>>
>> While reviewing James's PR for pointer events
>> <https://github.com/w3c/webdriver/pull/495>, I noticed two things.
>>
>> 1/ We have no way of knowing the size of the viewport, or where an element
>> is within the viewport.

Except by executing javascript.

>> 2/ We have no way of knowing the size of an element within the viewport in
>> anything other than CSS reference pixels.

I don't think there are other units that make sense.

>> 3/ We have no text on how to handle the case where an element is outside
>> of the viewport.

In what situation? For pointer move that should return "move target out 
of bounds" with the new PR.

>> In order to help give some context to the discussion, consider three
>> separate use-cases:
>>
>> A/ A user expects "get element rect, calculate half the width, perform
>> pointer move to element, perform second pointer move by that half width" to
>> be the same as "get element rect, calculate half the width, perform a
>> single pointer move with element and xoffset of the half width" to cause
>> the pointer to end in the same place.
>>
>> B/ A series of interactions begins starting from element A and ending at
>> element B, who's final x/y location is determined algorithmically and isn't
>> known in advance. Until the interactions begins, element B is not within
>> the viewport, and the size of the viewport is unknown --- on local test
>> runs, the display is 2880 x 1800, but when running on a "webdriver as a
>> service" provider, the screen size is 1024 x 768.

Note that we currently resolve element coordinates only when an action 
is dispatched. So if you have a situation where you have multiple 
actions and the element is not in view until action M < N but is the 
target of a pointer move in action N, that all works fine.

What doesn't work is the case where in a single pointer move action both 
causes an element to appear in the viewport and targets that element. 
But I think that's a totally reasonable restriction; if the browser 
doesn't know what the target is until the move starts, what should be 
the initial vector of move? To resolve the ambiguity the author must 
split a move into two parts, one with a manually specified target, and 
one with the element as the target.

>>
>> C/ A user wants to start the pointer move in one frame, and end in
>> another, performing a drag of (for example) an email into (for example) a
>> folder of a web-based email app.
>>
>> Breaking these down, "a" and "2" show that we have a problem with the
>> units used for specifying distances and sizes in webdriver. Most of the
>> time, it's CSS reference pixels, but in Actions, we flip to using locations
>> within viewports. We don't provide a mechanism to translate between the
>> two. It would feel that consistently using CSS reference pixels throughout
>> would be simpler for an end-user to understand, though more complex to
>> implement at the remote end (since you now need to convert from reference
>> pixels to a clientX/Y)

I think you are misunderstanding the use of "CSS [reference] pixels". 
They are just a unit, more or less the foundational unit of layout on 
the web. A particular set of coordinates has both a unit and an origin; 
the underlying issue here seems to be the use of different coordinate 
origins in different parts of the WebDriver protocol.

>> However, I'm not sure whether "c" would complicate using css reference
>> pixels: what if a user had changed the zoom level in one frame but not the
>> other? Should we even allow drag motions between frames?

CSS pixels get larger when the user zooms.

> At TPAC Shenzen we decided that this was not a use case (C)  we were going
> to support. Notes at
> https://www.w3.org/2013/11/11-testing-minutes.html#item12. There are
> possible security sandboxing issues. There is also the issue doing an
> implicit switch_to_frame to the new frame, doing the relevant look up for
> the element and what to do if its stale. When the Action Chain is finished,
> which frame do you end on? Since there implicit frame switch people could
> be expecting either case and this could lead to a footgun.
>
>
>>
>> It also seems clear that we need some mechanism to cause a scroll to
>> happen mid-way through a series of (pointer) actions. We could do this
>> implicitly (which would make "b" possible), by asking someone to specify a
>> scroll action (from the null input device?), with a delta and an optional
>> target element (which also makes "b" possible), or by returning some kind
>> of error stating that scrolling would be necessary to complete the action
>> (which may make "b" impossible).
>>
>>
> In Shenzen, we said we didnt need such an API (however the new Actions API
> wasnt on the table at that point). We did, however, in SF
> https://www.w3.org/2014/02/25-testing-minutes.html talk about scrolling to
> elements for different commands and how it would be good to turn this on
> and off. Perhaps this needs to be either an Actions "task" or it needs to
> be a property in the actions blob sent over the wire. I don't mind either
> way.

I am in favour of making scroll a primitive action rather then something 
implicit. Indeed the design of actions with an input type of "none" is 
specifically designed with this future extension in mind. I think this 
is necessary for use cases like infinite scrolling where the page is 
expected to dynamically resize  That said I think it should be a future 
extension and not something we do right now.

>> My ideal outcome as a user would be:
>>
>> * All distances and sizes are always given in CSS reference pixels.
>> * Scrolling happens thanks to a "scroll action" added to the events, or
>> when a user specifies a target element in another action.

So, I think what you mean here is "everything is in document-origin 
coordinates". I don't really see the advantage of this. For moving to an 
element the coordinate system isn't very relevant. For moving to a 
specific point viewport origin coordinates seem easier to reason about 
because you know that any location in the range (0->width,0->height) is 
a valid coordinate, without having to consider scroll position. This 
seems much more natural for the gestures use cases, and other things 
that don't involve interaction with specific elements, than 
document-origin coordinates where you always need to adjust for scroll 
position.

>> A painful but possibly workable solution would be:
>>
>> * Provide a mechanism to get the current viewport size.
>>
>
> Within the Actions commands? Why can't we just use #executeScript for this?
>
>
>> * Provide a mechanism to get the size of the currently active frame in the
>> viewport.

I agree that there should be a way (not as part of actions) to get the 
size of the viewport. The fact that get window size includes window 
chrome seems to me to be a clear bug that I wouldn't put in the spec 
except for legacy compat concerns. In particular I can't see a use case 
where knowing the actual size of the os window, instead of the size of 
the content area, is important.

I'm not sure what the use case for the size of the frame is.

> Again, where would we put this command and why can't we use #executeScript?
>
>
>> * Add additional properties to "get element rect" to return the client
>> x/y/width/height of the element, assuming that it was scrolled into the
>> current viewport.
>>
>
> It already returns that information. It doesnt return viewport positions,
> unless you are using #executeScript and using the JS
> element#getClientBoundingRect()

Well what it returns at the moment is the position in document-origin 
coordinates. "The x,y assuming that it was scrolled into the document" 
doesn't make much sense, because an element could have many locations 
whilst being scrolled into the viewport.

I agree that the primitive allowing viewport-origin coordinates might be 
an improvement.

>> * Provide a scaling factor for converting between CSS reference pixels and
>> client position
>>
>
> Historically, we havent supported people changing the scaling in their app
> and told them they need to fix it. See IEDriver as an example.
>
>
>> * Make local ends do the maths for users
>>
>
> Fine by me.
>
>
>> * Make scrolling explicit.
>>
>
> Fine by me
>
>
>>
>> The former seems simpler from a local end PoV, but I'm unsure how much
>> work it would take at the remote end.
>>
>> I've come round to the idea scrolling should not be implicit, since it
>> makes use case "c" a PITA to implement.

Is there anything that wouldn't be fixed by making "Get Window Size" 
optionally exclude the browser chrome (i.e. return the size of the 
content area), Get Element Rect optionally return viewport coordinates, 
and eventually adding a scrolling primitive to actions?

I think these things are all possible and not too hard. I don't think 
any of them are high priority however.

Received on Thursday, 12 January 2017 13:09:26 UTC