Re: Actions questions from James Graham on 2016-04-29 (public-browser-tools-testing@w3.org from April to June 2016)

From: James Graham <james@hoppipolla.co.uk>
Date: Fri, 29 Apr 2016 16:56:25 +0100
To: public-browser-tools-testing@w3.org
Message-ID: <57238429.4060906@hoppipolla.co.uk>
On 20/04/16 21:08, David Burns wrote:
>     Inside this top-level list, the spec suggests further lists, but I
>     have a diagram here that suggests an object:
>
>     {"type": "key",
>       "id": "1",
>       "actions": [{"name": "keyDown",
>                    "code": "a"},
>                   {"name": "keyUp",
>                    "code": "a"}
>                  ]
>     }
>
>     Compared to what's in the spec, this allows an id attribute, which
>     is needed for e.g. multi-touch. But it's more constrained in that
>     each sequence of actions can only refer to a specific device, so if
>     I wanted to have some pointer actions and some key actions, in a
>     sequence, I would have to send multiple chains, padding with pause
>     actions. Does it make sense to put the type and id on each action
>     entry, like:
>
>     [{"type": "key",
>        "id": "1",
>        "name": "keyDown",
>        "code": "a"},
>     {"type": "key",
>       "id": 1,
>       "name": "keyUp",
>       "code": "a"}
>     ]
>
>
> I think this would be fine, the original reason for doing this was to
> simplify the data structure that was being sent over. As long as we do
> {"actions":[
>                     [{"type": "key",
>                       "id": "1",
>                       "name": "keyDown",
>                       "code": "a"},
>                      {"type": "key",
>                       "id": 1,
>                       "name": "keyUp",
>                       "code": "a"}
> ],
>                      [{"type": "key",
>                       "id": "2",
>                       "name": "keyDown",
>                       "code": "a"},
>                      {"type": "key",
>                       "id": 2,
>                       "name": "keyUp",
>                       "code": "a"}
> ]
>                 ]
> }
>
> Originally Malini and I wanted to just take the array of actions and
> slice them accordingly without having to inspect each dictionary to get
> the `id`. Inspecting each dictionary in the array might be fine just
> extra work that we didnt feel was necessary

OK, I think the problem here is that I hadn't fully understood the model 
from the spec. If the intent is that each parallel action chain refers 
to exactly one input device then it makes sense to use the structure on 
the digram (i.e. the more nested option). It's just more awkward for 
"simple" cases (press a key, release it, move the mouse, click) because 
you need lots of padding operations to get the correct sequence. But 
maybe that isn't considered a huge problem (and clients could fix this 
by letting people mix and match in the API and inserting implicit 
padding as required).

>     Should any of the fields be optional e.g. should it be OK to send an
>     action without an id, and have the remote end use an implicit id for
>     this undefined case? That would almost always be the right thing for
>     keyboards, for example. If that doesn't happen, and the id format is
>     "any string" it seems likely that local ends are going to have to
>     send a uuid id as a default (to prevent it clashing with any later
>     user-defined ids).
>
>
> I think `type` needs to be there but if `id` is missing we can assume
> that everything is happening on that input device. The tricky part here
> is if we get
>
>                      [{"type": "key",
>                       "name": "keyDown",
>                       "code": "a"},
>                      {"type": "click",
>                       "name": "click",
>                       "code": "1"}
> ]
>
> We need to error somewhere, probably to prevent getting halfway through
> the sequence and then erroring because we can't do clicks on keyboards.
> This then suggests, to your previous question, we may want to inspect
> the dicts as we parse the data.

Makes sense.

For an input device like a pointer, the pointer can clearly only do one 
thing at a time. For a keyboard that is less true. Should it be possible 
to have multiple concurrent action chains corresponding to the "same" 
keyboard?

>     Is it intended that the full payload is validated before any actions
>     are taken? The current spec is specifically written in a way where
>     the actions will partially complete if there is an entry half-way in
>     with the wrong format, but I suspect this is an oversight.
>
>
> This is an oversight.

Good :)

>     == General Semantics ==
>
>     The assumption of the current specification seems to be that for
>     actions that produce internal state, that state can persist for
>     longer than a single API call, and that all such state is removed by
>     the DELETE endpoint. However it's not clear to me if there's a
>     usecase of this, or what the semantics are of releasing state (this
>     also applies to the alternate scenario where state is released at
>     the end of each API call). Consider for example:
>
>     [
>     [{keyDown a}],
>     [{keyDown b}]
>     ]
>
>     When the state is released, does this work like sending {keyUp a}
>     and {keyUp b} actions? Which order do such implicit actions occur
>     in? Or is the idea that you just purge internal state without having
>     any other effect? This latter option seems problematic if the
>     browser or content assumes that it will always get matched pairs of
>     certain events.
>
>
> The use case is that people can do "half" a sequence and then assert on
> the current state of Application under test(AUT). For example
>
> pointerMove(x, y)
> assertThat(element.isDisplayed())
> pointerClick()
> ... # rest of sequence

I see.

>     == Pause Action / Temporal Ordering ==
>
>     It is unclear to me how the temporal ordering is supposed to work in
>     general. I assume that simultaneous actions are intended to happen
>     left-to-right, top-to-bottom so that given
>
>     [
>     [{a}, {b}],
>     [{c}, {d}]
>     ]
>
>     the order of starting each action would be a,b,c,d. However it seems
>     that the pause action can take a duration. What use cases is this
>     supposed to cover, and when exactly do things happen? e.g. if I have
>     (assuming pauses are measured in s for brevity):
>
>     [
>     [{keyDown a}, {pause 1},         {keyUp a}]
>     [{pause 2},   {pointerMove 10 20}, {pause 3}]
>     ]
>
>     Should I expect the behaviour to be press down a, 2s pause,
>     instantaneous pointer move to 10,20, 1 second pause, lift a, 3
>     second pause? Or are the events supposed to happen at some other
>     point relative to the pause (e.g. in the middle of the tick?).
>     Should events like mouseMove be "smeared out" over the tick duration
>     somehow (e.g. a linear interpolation of the position firing an event
>     every 16ms, or using requestAnimationFrame, or whatever).
>
>
> Each event should be dispatched straight away, I dont want them smeared,
> because it might be a case of you want to draw something on a canvas or
> playing a game. E.g. Game handles `keyDown` event to move forward and
> clicks for firing blowing bubbles at aliens as you run past them.
>
> [
> [{keyDown a}, {pause 1},{pause 1}, {pause 1}, {keyUp a}]
> [{pointerMove 10 20}, pointerDown(),  {pointerMove 20 30}, pointerDown(),]
> ]

Ok, so events are always dispatched as close to the start of a tick as 
possible?

>
>
>     == Elements ==
>
>     It seems that some actions (keys, events) are supposed to be
>     relative to an element, but there isn't anything in the protocol to
>     specify which element. So, for those actions how does one supply the
>     element?
>
>
> The client bindings set the focus on the element and then the rest is
> assumed that you are working on the active element. See
> https://github.com/SeleniumHQ/selenium/blob/master/py/selenium/webdriver/common/action_chains.py#L156
> as an example

OK, so there is no way to supply an element with the action, or change 
the element mid action chain without dispatching actions that will do so.

>     == Key Actions ==
>
>     Fundamentally I am unclear what key event model people want to
>     standardise. I have seen lots of conversations around specific
>     keyboard layouts and IMEs and so on. At the same time many platforms
>     now don't present physical keyboards, and the kind of interaction
>     you get from something like Swype doesn't seem possible to model in
>     the current specification. I think interoperability is possible
>     through a model in which key actions generate (well-specified) DOM
>     events, and above-browser parts of the system (compose key, soft
>     keyboard, IME, etc.) are abstracted away. Is there a strong reason
>     that this simple model is not good enough?
>
>
> I think if we pick something from
> https://www.w3.org/TR/uievents-code/#keyboard-common-layouts we can then
> get what we need. Seeing as we have

[...]

It's not yet clear to me exactly what effect the choice of keyboard 
layout has if you can send any codepoint as the 'key' to press. Maybe 
someone can fill me in?

>
>     Is it expected that the keyboard model has key repetition e.g. if I do
>
>     [[{keyDown a}, {pause 10}, {keyUp a}]]
>
>     when the focus is on an input control, how many "a" characters
>     should I see?
>
>
> Originally I didnt think of this but I guess, from my previous answer
> above I think we should see a "few" a's

Based on followup discussion, it seems like the answer is that you 
should get a single event per action.

>
>
>     == Pointer Actions ==
>
>     It seems like pointer actions are always specified relative to an
>     element? Is this correct, or should it also be possible to specify
>     relative to the viewport?
>
>     There is an open issue about dispatching touch events and other
>     kinds of events. How will this be handled?
>
>
> I would love for someone to have some thoughts on this. The issue I can
> see is that with some devices, like a Surface, you can have touch events
> when using the screen but you can also have a mouse, which ones should
> we send if we could detect both. Since Touch/Pointer is a minefield it
> would be great to get this nailed down.

Should they simply be different action types? Possibly there would have 
to be a way to signal that a particular browser / device didn't support 
a particular class of actions.
Received on Friday, 29 April 2016 15:56:56 UTC