- From: James Graham <james@hoppipolla.co.uk>
- Date: Wed, 20 Apr 2016 13:47:42 +0100
- To: "public-browser-tools-testing@w3.org" <public-browser-tools-testing@w3.org>
- Cc: tdresser@google.com, lanwei@google.com, Sam Uong <samuong@chromium.org>
I am trying to understand the intent behind the actions section in the
spec, so I can shore up the text a bit, and make it implementable. I
have some of the notes from the meeting at Facebook London, but many
outstanding questions, some of which are below.
== Protocol ==
I think the spec currently states that the top-level message structure
is an object, like
{"actions": []}
Other diagrams indicate a list directly at the top level. Which should
we choose? I feel like the former is closer in structure to the other
commands in the spec, which all pass parameters as an object. It also
provides some measure of future-proofing as we can extend the top level
of the parameters in the future without ugly hacks.
Inside this top-level list, the spec suggests further lists, but I have
a diagram here that suggests an object:
{"type": "key",
"id": "1",
"actions": [{"name": "keyDown",
"code": "a"},
{"name": "keyUp",
"code": "a"}
]
}
Compared to what's in the spec, this allows an id attribute, which is
needed for e.g. multi-touch. But it's more constrained in that each
sequence of actions can only refer to a specific device, so if I wanted
to have some pointer actions and some key actions, in a sequence, I
would have to send multiple chains, padding with pause actions. Does it
make sense to put the type and id on each action entry, like:
[{"type": "key",
"id": "1",
"name": "keyDown",
"code": "a"},
{"type": "key",
"id": 1,
"name": "keyUp",
"code": "a"}
]
Should any of the fields be optional e.g. should it be OK to send an
action without an id, and have the remote end use an implicit id for
this undefined case? That would almost always be the right thing for
keyboards, for example. If that doesn't happen, and the id format is
"any string" it seems likely that local ends are going to have to send a
uuid id as a default (to prevent it clashing with any later user-defined
ids).
Is it intended that the full payload is validated before any actions are
taken? The current spec is specifically written in a way where the
actions will partially complete if there is an entry half-way in with
the wrong format, but I suspect this is an oversight.
== General Semantics ==
The assumption of the current specification seems to be that for actions
that produce internal state, that state can persist for longer than a
single API call, and that all such state is removed by the DELETE
endpoint. However it's not clear to me if there's a usecase of this, or
what the semantics are of releasing state (this also applies to the
alternate scenario where state is released at the end of each API call).
Consider for example:
[
[{keyDown a}],
[{keyDown b}]
]
When the state is released, does this work like sending {keyUp a} and
{keyUp b} actions? Which order do such implicit actions occur in? Or is
the idea that you just purge internal state without having any other
effect? This latter option seems problematic if the browser or content
assumes that it will always get matched pairs of certain events.
== Pause Action / Temporal Ordering ==
It is unclear to me how the temporal ordering is supposed to work in
general. I assume that simultaneous actions are intended to happen
left-to-right, top-to-bottom so that given
[
[{a}, {b}],
[{c}, {d}]
]
the order of starting each action would be a,b,c,d. However it seems
that the pause action can take a duration. What use cases is this
supposed to cover, and when exactly do things happen? e.g. if I have
(assuming pauses are measured in s for brevity):
[
[{keyDown a}, {pause 1}, {keyUp a}]
[{pause 2}, {pointerMove 10 20}, {pause 3}]
]
Should I expect the behaviour to be press down a, 2s pause,
instantaneous pointer move to 10,20, 1 second pause, lift a, 3 second
pause? Or are the events supposed to happen at some other point relative
to the pause (e.g. in the middle of the tick?). Should events like
mouseMove be "smeared out" over the tick duration somehow (e.g. a linear
interpolation of the position firing an event every 16ms, or using
requestAnimationFrame, or whatever).
== Elements ==
It seems that some actions (keys, events) are supposed to be relative to
an element, but there isn't anything in the protocol to specify which
element. So, for those actions how does one supply the element?
== Key Actions ==
Fundamentally I am unclear what key event model people want to
standardise. I have seen lots of conversations around specific keyboard
layouts and IMEs and so on. At the same time many platforms now don't
present physical keyboards, and the kind of interaction you get from
something like Swype doesn't seem possible to model in the current
specification. I think interoperability is possible through a model in
which key actions generate (well-specified) DOM events, and
above-browser parts of the system (compose key, soft keyboard, IME,
etc.) are abstracted away. Is there a strong reason that this simple
model is not good enough?
Is it expected that the keyboard model has key repetition e.g. if I do
[[{keyDown a}, {pause 10}, {keyUp a}]]
when the focus is on an input control, how many "a" characters should I see?
== Pointer Actions ==
It seems like pointer actions are always specified relative to an
element? Is this correct, or should it also be possible to specify
relative to the viewport?
There is an open issue about dispatching touch events and other kinds of
events. How will this be handled?
There is probably a lot more to clarify, but that seems like a
reasonable start…
Received on Wednesday, 20 April 2016 12:48:09 UTC