- From: James Graham <james@hoppipolla.co.uk>
- Date: Wed, 20 Apr 2016 13:47:42 +0100
- To: "public-browser-tools-testing@w3.org" <public-browser-tools-testing@w3.org>
- Cc: tdresser@google.com, lanwei@google.com, Sam Uong <samuong@chromium.org>
I am trying to understand the intent behind the actions section in the spec, so I can shore up the text a bit, and make it implementable. I have some of the notes from the meeting at Facebook London, but many outstanding questions, some of which are below. == Protocol == I think the spec currently states that the top-level message structure is an object, like {"actions": []} Other diagrams indicate a list directly at the top level. Which should we choose? I feel like the former is closer in structure to the other commands in the spec, which all pass parameters as an object. It also provides some measure of future-proofing as we can extend the top level of the parameters in the future without ugly hacks. Inside this top-level list, the spec suggests further lists, but I have a diagram here that suggests an object: {"type": "key", "id": "1", "actions": [{"name": "keyDown", "code": "a"}, {"name": "keyUp", "code": "a"} ] } Compared to what's in the spec, this allows an id attribute, which is needed for e.g. multi-touch. But it's more constrained in that each sequence of actions can only refer to a specific device, so if I wanted to have some pointer actions and some key actions, in a sequence, I would have to send multiple chains, padding with pause actions. Does it make sense to put the type and id on each action entry, like: [{"type": "key", "id": "1", "name": "keyDown", "code": "a"}, {"type": "key", "id": 1, "name": "keyUp", "code": "a"} ] Should any of the fields be optional e.g. should it be OK to send an action without an id, and have the remote end use an implicit id for this undefined case? That would almost always be the right thing for keyboards, for example. If that doesn't happen, and the id format is "any string" it seems likely that local ends are going to have to send a uuid id as a default (to prevent it clashing with any later user-defined ids). Is it intended that the full payload is validated before any actions are taken? The current spec is specifically written in a way where the actions will partially complete if there is an entry half-way in with the wrong format, but I suspect this is an oversight. == General Semantics == The assumption of the current specification seems to be that for actions that produce internal state, that state can persist for longer than a single API call, and that all such state is removed by the DELETE endpoint. However it's not clear to me if there's a usecase of this, or what the semantics are of releasing state (this also applies to the alternate scenario where state is released at the end of each API call). Consider for example: [ [{keyDown a}], [{keyDown b}] ] When the state is released, does this work like sending {keyUp a} and {keyUp b} actions? Which order do such implicit actions occur in? Or is the idea that you just purge internal state without having any other effect? This latter option seems problematic if the browser or content assumes that it will always get matched pairs of certain events. == Pause Action / Temporal Ordering == It is unclear to me how the temporal ordering is supposed to work in general. I assume that simultaneous actions are intended to happen left-to-right, top-to-bottom so that given [ [{a}, {b}], [{c}, {d}] ] the order of starting each action would be a,b,c,d. However it seems that the pause action can take a duration. What use cases is this supposed to cover, and when exactly do things happen? e.g. if I have (assuming pauses are measured in s for brevity): [ [{keyDown a}, {pause 1}, {keyUp a}] [{pause 2}, {pointerMove 10 20}, {pause 3}] ] Should I expect the behaviour to be press down a, 2s pause, instantaneous pointer move to 10,20, 1 second pause, lift a, 3 second pause? Or are the events supposed to happen at some other point relative to the pause (e.g. in the middle of the tick?). Should events like mouseMove be "smeared out" over the tick duration somehow (e.g. a linear interpolation of the position firing an event every 16ms, or using requestAnimationFrame, or whatever). == Elements == It seems that some actions (keys, events) are supposed to be relative to an element, but there isn't anything in the protocol to specify which element. So, for those actions how does one supply the element? == Key Actions == Fundamentally I am unclear what key event model people want to standardise. I have seen lots of conversations around specific keyboard layouts and IMEs and so on. At the same time many platforms now don't present physical keyboards, and the kind of interaction you get from something like Swype doesn't seem possible to model in the current specification. I think interoperability is possible through a model in which key actions generate (well-specified) DOM events, and above-browser parts of the system (compose key, soft keyboard, IME, etc.) are abstracted away. Is there a strong reason that this simple model is not good enough? Is it expected that the keyboard model has key repetition e.g. if I do [[{keyDown a}, {pause 10}, {keyUp a}]] when the focus is on an input control, how many "a" characters should I see? == Pointer Actions == It seems like pointer actions are always specified relative to an element? Is this correct, or should it also be possible to specify relative to the viewport? There is an open issue about dispatching touch events and other kinds of events. How will this be handled? There is probably a lot more to clarify, but that seems like a reasonable start…
Received on Wednesday, 20 April 2016 12:48:09 UTC