Units of measurement and scrolling in actions

Hi,

TL;DR: should units for size and distance be consistent throughout the
spec? And if a scroll is needed when using Actions, how should that be
specified, or is it implicit?

Long version:

While reviewing James's PR for pointer events
<https://github.com/w3c/webdriver/pull/495>, I noticed two things.

1/ We have no way of knowing the size of the viewport, or where an element
is within the viewport.

2/ We have no way of knowing the size of an element within the viewport in
anything other than CSS reference pixels.

3/ We have no text on how to handle the case where an element is outside of
the viewport.

In order to help give some context to the discussion, consider three
separate use-cases:

A/ A user expects "get element rect, calculate half the width, perform
pointer move to element, perform second pointer move by that half width" to
be the same as "get element rect, calculate half the width, perform a
single pointer move with element and xoffset of the half width" to cause
the pointer to end in the same place.

B/ A series of interactions begins starting from element A and ending at
element B, who's final x/y location is determined algorithmically and isn't
known in advance. Until the interactions begins, element B is not within
the viewport, and the size of the viewport is unknown --- on local test
runs, the display is 2880 x 1800, but when running on a "webdriver as a
service" provider, the screen size is 1024 x 768.

C/ A user wants to start the pointer move in one frame, and end in another,
performing a drag of (for example) an email into (for example) a folder of
a web-based email app.

Breaking these down, "a" and "2" show that we have a problem with the units
used for specifying distances and sizes in webdriver. Most of the time,
it's CSS reference pixels, but in Actions, we flip to using locations
within viewports. We don't provide a mechanism to translate between the
two. It would feel that consistently using CSS reference pixels throughout
would be simpler for an end-user to understand, though more complex to
implement at the remote end (since you now need to convert from reference
pixels to a clientX/Y)

However, I'm not sure whether "c" would complicate using css reference
pixels: what if a user had changed the zoom level in one frame but not the
other? Should we even allow drag motions between frames?

It also seems clear that we need some mechanism to cause a scroll to happen
mid-way through a series of (pointer) actions. We could do this implicitly
(which would make "b" possible), by asking someone to specify a scroll
action (from the null input device?), with a delta and an optional target
element (which also makes "b" possible), or by returning some kind of error
stating that scrolling would be necessary to complete the action (which may
make "b" impossible).

My ideal outcome as a user would be:

* All distances and sizes are always given in CSS reference pixels.
* Scrolling happens thanks to a "scroll action" added to the events, or
when a user specifies a target element in another action.

A painful but possibly workable solution would be:

* Provide a mechanism to get the current viewport size.
* Provide a mechanism to get the size of the currently active frame in the
viewport.
* Add additional properties to "get element rect" to return the client
x/y/width/height of the element, assuming that it was scrolled into the
current viewport.
* Provide a scaling factor for converting between CSS reference pixels and
client position
* Make local ends do the maths for users
* Make scrolling explicit.

The former seems simpler from a local end PoV, but I'm unsure how much work
it would take at the remote end.

I've come round to the idea scrolling should not be implicit, since it
makes use case "c" a PITA to implement.

Thoughts?

Simon

Received on Thursday, 12 January 2017 09:27:17 UTC