Actions questions from James Graham on 2016-04-20 (public-browser-tools-testing@w3.org from April to June 2016)

From: James Graham <james@hoppipolla.co.uk>
Date: Wed, 20 Apr 2016 13:47:42 +0100
To: "public-browser-tools-testing@w3.org" <public-browser-tools-testing@w3.org>
Cc: tdresser@google.com, lanwei@google.com, Sam Uong <samuong@chromium.org>
Message-ID: <57177A6E.7090009@hoppipolla.co.uk>
I am trying to understand the intent behind the actions section in the 
spec, so I can shore up the text a bit, and make it implementable. I 
have some of the notes from the meeting at Facebook London, but many 
outstanding questions, some of which are below.

== Protocol ==

I think the spec currently states that the top-level message structure 
is an object, like

{"actions": []}

Other diagrams indicate a list directly at the top level. Which should 
we choose? I feel like the former is closer in structure to the other 
commands in the spec, which all pass parameters as an object. It also 
provides some measure of future-proofing as we can extend the top level 
of the parameters in the future without ugly hacks.

Inside this top-level list, the spec suggests further lists, but I have 
a diagram here that suggests an object:

{"type": "key",
  "id": "1",
  "actions": [{"name": "keyDown",
               "code": "a"},
              {"name": "keyUp",
               "code": "a"}
             ]
}

Compared to what's in the spec, this allows an id attribute, which is 
needed for e.g. multi-touch. But it's more constrained in that each 
sequence of actions can only refer to a specific device, so if I wanted 
to have some pointer actions and some key actions, in a sequence, I 
would have to send multiple chains, padding with pause actions. Does it 
make sense to put the type and id on each action entry, like:

[{"type": "key",
   "id": "1",
   "name": "keyDown",
   "code": "a"},
{"type": "key",
  "id": 1,
  "name": "keyUp",
  "code": "a"}
]

Should any of the fields be optional e.g. should it be OK to send an 
action without an id, and have the remote end use an implicit id for 
this undefined case? That would almost always be the right thing for 
keyboards, for example. If that doesn't happen, and the id format is 
"any string" it seems likely that local ends are going to have to send a 
uuid id as a default (to prevent it clashing with any later user-defined 
ids).

Is it intended that the full payload is validated before any actions are 
taken? The current spec is specifically written in a way where the 
actions will partially complete if there is an entry half-way in with 
the wrong format, but I suspect this is an oversight.

== General Semantics ==

The assumption of the current specification seems to be that for actions 
that produce internal state, that state can persist for longer than a 
single API call, and that all such state is removed by the DELETE 
endpoint. However it's not clear to me if there's a usecase of this, or 
what the semantics are of releasing state (this also applies to the 
alternate scenario where state is released at the end of each API call). 
Consider for example:

[
[{keyDown a}],
[{keyDown b}]
]

When the state is released, does this work like sending {keyUp a} and 
{keyUp b} actions? Which order do such implicit actions occur in? Or is 
the idea that you just purge internal state without having any other 
effect? This latter option seems problematic if the browser or content 
assumes that it will always get matched pairs of certain events.

== Pause Action / Temporal Ordering ==

It is unclear to me how the temporal ordering is supposed to work in 
general. I assume that simultaneous actions are intended to happen 
left-to-right, top-to-bottom so that given

[
[{a}, {b}],
[{c}, {d}]
]

the order of starting each action would be a,b,c,d. However it seems 
that the pause action can take a duration. What use cases is this 
supposed to cover, and when exactly do things happen? e.g. if I have 
(assuming pauses are measured in s for brevity):

[
[{keyDown a}, {pause 1},         {keyUp a}]
[{pause 2},   {pointerMove 10 20}, {pause 3}]
]

Should I expect the behaviour to be press down a, 2s pause, 
instantaneous pointer move to 10,20, 1 second pause, lift a, 3 second 
pause? Or are the events supposed to happen at some other point relative 
to the pause (e.g. in the middle of the tick?). Should events like 
mouseMove be "smeared out" over the tick duration somehow (e.g. a linear 
interpolation of the position firing an event every 16ms, or using 
requestAnimationFrame, or whatever).

== Elements ==

It seems that some actions (keys, events) are supposed to be relative to 
an element, but there isn't anything in the protocol to specify which 
element. So, for those actions how does one supply the element?

== Key Actions ==

Fundamentally I am unclear what key event model people want to 
standardise. I have seen lots of conversations around specific keyboard 
layouts and IMEs and so on. At the same time many platforms now don't 
present physical keyboards, and the kind of interaction you get from 
something like Swype doesn't seem possible to model in the current 
specification. I think interoperability is possible through a model in 
which key actions generate (well-specified) DOM events, and 
above-browser parts of the system (compose key, soft keyboard, IME, 
etc.) are abstracted away. Is there a strong reason that this simple 
model is not good enough?

Is it expected that the keyboard model has key repetition e.g. if I do

[[{keyDown a}, {pause 10}, {keyUp a}]]

when the focus is on an input control, how many "a" characters should I see?

== Pointer Actions ==

It seems like pointer actions are always specified relative to an 
element? Is this correct, or should it also be possible to specify 
relative to the viewport?

There is an open issue about dispatching touch events and other kinds of 
events. How will this be handled?


There is probably a lot more to clarify, but that seems like a 
reasonable start…
Received on Wednesday, 20 April 2016 12:48:09 UTC