Re: Actions questions from David Burns on 2016-04-20 (public-browser-tools-testing@w3.org from April to June 2016)

From: David Burns <dburns@mozilla.com>
Date: Wed, 20 Apr 2016 21:08:06 +0100
To: James Graham <james@hoppipolla.co.uk>
Cc: "public-browser-tools-testing@w3.org" <public-browser-tools-testing@w3.org>, tdresser@google.com, lanwei@google.com, Sam Uong <samuong@chromium.org>
Message-ID: <CAAoW2AH=yYX-3xQr2L+HCT1mfeqKWj09Ru9TJF-CTOMzk71fsw@mail.gmail.com>
Hi James,

I have tried to reply to each of the items below.

David

On 20 April 2016 at 13:47, James Graham <james@hoppipolla.co.uk> wrote:

> I am trying to understand the intent behind the actions section in the
> spec, so I can shore up the text a bit, and make it implementable. I have
> some of the notes from the meeting at Facebook London, but many outstanding
> questions, some of which are below.
>
> == Protocol ==
>
> I think the spec currently states that the top-level message structure is
> an object, like
>
> {"actions": []}
>
> Other diagrams indicate a list directly at the top level. Which should we
> choose? I feel like the former is closer in structure to the other commands
> in the spec, which all pass parameters as an object. It also provides some
> measure of future-proofing as we can extend the top level of the parameters
> in the future without ugly hacks.
>

I agree we need to do {actions:[]}



>
> Inside this top-level list, the spec suggests further lists, but I have a
> diagram here that suggests an object:
>
> {"type": "key",
>  "id": "1",
>  "actions": [{"name": "keyDown",
>               "code": "a"},
>              {"name": "keyUp",
>               "code": "a"}
>             ]
> }
>
> Compared to what's in the spec, this allows an id attribute, which is
> needed for e.g. multi-touch. But it's more constrained in that each
> sequence of actions can only refer to a specific device, so if I wanted to
> have some pointer actions and some key actions, in a sequence, I would have
> to send multiple chains, padding with pause actions. Does it make sense to
> put the type and id on each action entry, like:
>
> [{"type": "key",
>   "id": "1",
>   "name": "keyDown",
>   "code": "a"},
> {"type": "key",
>  "id": 1,
>  "name": "keyUp",
>  "code": "a"}
> ]
>

I think this would be fine, the original reason for doing this was to
simplify the data structure that was being sent over. As long as we do
{"actions":[
                   [{"type": "key",
                     "id": "1",
                     "name": "keyDown",
                     "code": "a"},
                    {"type": "key",
                     "id": 1,
                     "name": "keyUp",
                     "code": "a"}
                    ],
                    [{"type": "key",
                     "id": "2",
                     "name": "keyDown",
                     "code": "a"},
                    {"type": "key",
                     "id": 2,
                     "name": "keyUp",
                     "code": "a"}
                    ]
               ]
}

Originally Malini and I wanted to just take the array of actions and slice
them accordingly without having to inspect each dictionary to get the `id`.
Inspecting each dictionary in the array might be fine just extra work that
we didnt feel was necessary


>
> Should any of the fields be optional e.g. should it be OK to send an
> action without an id, and have the remote end use an implicit id for this
> undefined case? That would almost always be the right thing for keyboards,
> for example. If that doesn't happen, and the id format is "any string" it
> seems likely that local ends are going to have to send a uuid id as a
> default (to prevent it clashing with any later user-defined ids).
>

I think `type` needs to be there but if `id` is missing we can assume that
everything is happening on that input device. The tricky part here is if we
get

                    [{"type": "key",
                     "name": "keyDown",
                     "code": "a"},
                    {"type": "click",
                     "name": "click",
                     "code": "1"}
                    ]

We need to error somewhere, probably to prevent getting halfway through the
sequence and then erroring because we can't do clicks on keyboards. This
then suggests, to your previous question, we may want to inspect the dicts
as we parse the data.


>
> Is it intended that the full payload is validated before any actions are
> taken? The current spec is specifically written in a way where the actions
> will partially complete if there is an entry half-way in with the wrong
> format, but I suspect this is an oversight.
>

This is an oversight.


>
> == General Semantics ==
>
> The assumption of the current specification seems to be that for actions
> that produce internal state, that state can persist for longer than a
> single API call, and that all such state is removed by the DELETE endpoint.
> However it's not clear to me if there's a usecase of this, or what the
> semantics are of releasing state (this also applies to the alternate
> scenario where state is released at the end of each API call). Consider for
> example:
>
> [
> [{keyDown a}],
> [{keyDown b}]
> ]
>
> When the state is released, does this work like sending {keyUp a} and
> {keyUp b} actions? Which order do such implicit actions occur in? Or is the
> idea that you just purge internal state without having any other effect?
> This latter option seems problematic if the browser or content assumes that
> it will always get matched pairs of certain events.
>
>
The use case is that people can do "half" a sequence and then assert on the
current state of Application under test(AUT). For example

pointerMove(x, y)
assertThat(element.isDisplayed())
pointerClick()
... # rest of sequence


> == Pause Action / Temporal Ordering ==
>
> It is unclear to me how the temporal ordering is supposed to work in
> general. I assume that simultaneous actions are intended to happen
> left-to-right, top-to-bottom so that given
>
> [
> [{a}, {b}],
> [{c}, {d}]
> ]
>
> the order of starting each action would be a,b,c,d. However it seems that
> the pause action can take a duration. What use cases is this supposed to
> cover, and when exactly do things happen? e.g. if I have (assuming pauses
> are measured in s for brevity):
>
> [
> [{keyDown a}, {pause 1},         {keyUp a}]
> [{pause 2},   {pointerMove 10 20}, {pause 3}]
> ]
>
> Should I expect the behaviour to be press down a, 2s pause, instantaneous
> pointer move to 10,20, 1 second pause, lift a, 3 second pause? Or are the
> events supposed to happen at some other point relative to the pause (e.g.
> in the middle of the tick?). Should events like mouseMove be "smeared out"
> over the tick duration somehow (e.g. a linear interpolation of the position
> firing an event every 16ms, or using requestAnimationFrame, or whatever).
>

Each event should be dispatched straight away, I dont want them smeared,
because it might be a case of you want to draw something on a canvas or
playing a game. E.g. Game handles `keyDown` event to move forward and
clicks for firing blowing bubbles at aliens as you run past them.

[
[{keyDown a}, {pause 1},{pause 1}, {pause 1}, {keyUp a}]
[{pointerMove 10 20}, pointerDown(),  {pointerMove 20 30}, pointerDown(),]
]


>
> == Elements ==
>
> It seems that some actions (keys, events) are supposed to be relative to
> an element, but there isn't anything in the protocol to specify which
> element. So, for those actions how does one supply the element?
>

The client bindings set the focus on the element and then the rest is
assumed that you are working on the active element. See
https://github.com/SeleniumHQ/selenium/blob/master/py/selenium/webdriver/common/action_chains.py#L156
as an example


>
> == Key Actions ==
>
> Fundamentally I am unclear what key event model people want to
> standardise. I have seen lots of conversations around specific keyboard
> layouts and IMEs and so on. At the same time many platforms now don't
> present physical keyboards, and the kind of interaction you get from
> something like Swype doesn't seem possible to model in the current
> specification. I think interoperability is possible through a model in
> which key actions generate (well-specified) DOM events, and above-browser
> parts of the system (compose key, soft keyboard, IME, etc.) are abstracted
> away. Is there a strong reason that this simple model is not good enough?
>
>
I think if we pick something from
https://www.w3.org/TR/uievents-code/#keyboard-common-layouts we can then
get what we need. Seeing as we have


> Is it expected that the keyboard model has key repetition e.g. if I do
>
> [[{keyDown a}, {pause 10}, {keyUp a}]]
>
> when the focus is on an input control, how many "a" characters should I
> see?
>

Originally I didnt think of this but I guess, from my previous answer above
I think we should see a "few" a's


>
> == Pointer Actions ==
>
> It seems like pointer actions are always specified relative to an element?
> Is this correct, or should it also be possible to specify relative to the
> viewport?
>
> There is an open issue about dispatching touch events and other kinds of
> events. How will this be handled?
>
>
I would love for someone to have some thoughts on this. The issue I can see
is that with some devices, like a Surface, you can have touch events when
using the screen but you can also have a mouse, which ones should we send
if we could detect both. Since Touch/Pointer is a minefield it would be
great to get this nailed down.


>
> There is probably a lot more to clarify, but that seems like a reasonable
> start…
>
>
Received on Wednesday, 20 April 2016 20:08:35 UTC