Re: Interested in writing up an initial design for bidirectional WebDriver protocol

I also have some feedback on this document (with input from my 
colleagues at Mozilla, but this is not an official position etc.). I'll 
add the feedback below and then — in the next couple of days — start 
opening GitHub issues for the areas where there are decisions to be 
made, since it makes more sense to have the discussion there rather than 
on email.

Feedback by section:

## Goals

* This section captures a number of use cases, but I think there's a 
product-oriented view which is that we should be looking to provide the 
functionality that allows existing remote-automation libraries with 
browser-specific backends (or browser-specific features) to use a 
standard backend. This includes e.g. puppeteer, playwright, cypress, 
selenium, saucelabs.

* Some of these goals like "fail fast on any js error" seem like details 
of possible designs that can be discussed in the relevant features 
rather than top-level goals.

* "Access to native devtools protocol" shouldn't be an explicit goal. 
Having a way to support vendor extensions should be a goal (as you'd 
expect since WebDriver already has this capability). But for Firefox we 
don't anticipate building on the devtools protocol, and a requirement to 
expose that would significantly complicate things for us with little 
gain. Moreover starting a standard with the explicit goal of exposing 
nonstandard parts of implementations seems like it rather misses the 
point; we want a standard featureset that covers all the important 
cross-browser use cases so that people don't have to reach for the 
single-browser escape hatch at the cost of interop.

* The "easy mapping to native devtools protocol" doesn't seem like a 
goal for the protocol; for example we don't expect it to have a trivial 
mapping to RDP used in Firefox. Vendors may of course have constraints 
on the technical direction of the protocol which will derive from a 
shared implementation with their devtools, but those should form part of 
the discussion rather than be an explicit goal.

* An additional goal that we'd like to see is that the BiDi protocol 
ends up as a superset of the HTTP-based protocol i.e. there's never a 
requirement to send HTTP commands to get access to specific 
functionality. From an implementation point of view that would allow the 
HTTP-based and BiDi functionality to end up sharing all the code except 
for the transport layer.

## Transport

* Using full JSON-RPC creates some overhead vs JSON-RPCish JSON. How 
important is it to support existing JSON-RPC libraries vs using "plain" 
JSON without having to e.g. encode a transport-layer version number?

* The way that pipelining works in JSON-RPC seems like it might not be 
optimal. For remote-CI use cases reducing roundtrips on the network is 
important as this can add considerable latency. This is a much higher 
priority for WebDriver than it typically is for devtools where the 
client and host are ~always on the same machine or at least same local 
network (for remote debugging). The JSON-PRC approach looks like you 
have to wait for all the commands to complete before sending a response, 
but that requires to couple the request and response doesn't obviously 
make sense if it means I can't do something like send two long-running 
scripts and get the results back when they're ready.
We could build something higher-level of course, but that raises the 
question of why we're standardising on something where we're having to 
work around core features.

* The extent to which the JSON-RPC spec uses SHOULD is concerning

* I think we should just be definite that the transport is over 
websockets. In practice internal infrastructure can do whatever and it 
doesn't matter, but clients and servers must share a common transport 
for interop.

* Having a machine readable API description is good and we should do 
that (WebDriver/HTTP would benefit from this too), but we should also be 
aware that it's a tiny fraction of defining how the API works. All the 
hard work is defining semantics, not typechecking of messages.

## High Level Interface

### Notifications

* The refcounting thing seems confusing. In practice a client that 
allows random code to subscribe or unsubscribe from events seems like 
it's going to be fragile e.g. if some library code subscribes to the 
`foo` events and user code calls unsubscribe without calling subscribe 
it will break the library in spite of refcounting. It's also confusing 
if I can subscribe to `foo` and unsubscribe `foo` and keep getting `foo` 
events because some other part of the program is also called 
`subscribe`. At the protocol level you either get a certain kind of 
event or you don't and I think it makes more sense to allow the clients 
to manage the semantics around when to turn the events on and off rather 
than baking a particular strategy into the protocol.

## Establishing a bidirectional session

* The proposed capability doesn't match the resolutions from previous 
meetings. I don't think it makes sense to add a protocol version number 
here; from the point of view of the rest of the system there isn't 
anything backwards incompatible that requires a version number. We 
should just have a capability that when set means that a websocket 
server is spun up and you get the URL back.

## Message Routing

* WebDriver does have the concept of a ID for a browsing context; the 
spec supports the concept of running a script like `return 
document.getElemenstsByTagName("iframe")[0].contentWindow` I think in 
practice this is unimplemented, but we could reuse that serialization.

* Not sure if the target terminology makes sense. It doesn't match CDP. 
But some union type referring to things that may contain one or more 
execution contexts (Agents per ES, I think?) does make sense.

* "Identifying elements can work as they do today" seems like it 
undersells the complexity here. The existing protocol basically allows 
serializing json-compatible types or special types like elements, 
windows. CDP allows returning either JSON-compatible types or handles to 
internal objects, or some special case js types like BigInt. Neither 
setup is perfect e.g. CDP is very chatty when doing something like 
returning an array of Elements, but the existing protocol is unable to 
do something like return a promise. We should think clearly about the 
requirements here to ensure we don't end up with a serialization format 
that prevents clients adding features they want.

## Target Discovery

* What's the use case for providing browsing contexts as a tree rather 
than just having a way to get the child contexts under a given context? 
Requiring the tree to be built seems expensive and doesn't help with 
most use cases I can think of e.g. if I'm running some tests, I might 
want to ensure that any new top-level windows are closed after each 
test, but I don't care about frames created within the test. So I'd 
rather just subscribe to all the events relating to top-level windows 
and not care about subframes. It's also sort of unclear how the parent 
relationship is supposed to work; if I create an auxillary browsing 
context e.g. with window.open() so that the `opener` property is set, is 
that a top level context, or does it have a parent?






On 11/03/2020 18:38, Simon Stewart wrote:
> Hi,
> 
> I finally had a chance to sit down and read things! Some combined feedback:
> 
> The Good
> • JSON-RPC based
> • Exposing existing WebDriver functionality via the new protocol is good
> • Opens the door to implementing the existing protocol on the new one.
> • I quite like the section on establishing a bidi connection
> • I strongly dislike the “client can infer” approach to figuring out the 
> bidi URL. Much prefer the explicit capability.
> • Also removes the ability for a service provider to redirect to another 
> URL entirely
> • It’s not clear what the client is supposed to do with the WebSocket 
> URL. You can’t make an HTTP GET over WebSockets, it’s just a dumb socket.
> • Generalising the window handle idea to all contexts is good.
> 
> Unsure
> • Using OpenRPC instead of OpenAPI.
> • Reasoning seems okay, but the W3C might want something more formal
> • How does a local end apply to receive multiple events?
> • eg. for the equivalent of a CDP domain?
> • Or do we imagine a flat namespace?
> • The ref counting for event calling would work, but I suspect that 
> it’ll lead to unexpected behaviour
> • eg. in the example they give, the test code would still be receiving 
> the events, even if they’d turned asked to no longer ask for events
> • Typically this is handled by attaching and detaching listeners, though 
> this doesn’t fit with a low-impact way of mapping from the existing CDP 
> approach
> • This seems like an implementation detail. Is it really needed in the 
> spec? Browser implementors are smart enough to turn instrumentation 
> on/off as needed.
> • To establish a bidi connection, we could assume that any WebSocket 
> connection to /session would indicate that the client wants to use bidi
> • Differentiating all the contexts seems easy to implement, but confusing
> • We have a tuple of (id, type, parent) that effectively describes all 
> contexts that are covered in this explainer. I’d suggest leaning into that
> • Internally, WebKit uses a frameID, so this should be 
> straightforward... If a bit confusing and tedious for clients.
> Notes
> • The original wire protocol had numerical status codes. “Nice” to see 
> those are coming back.
> • The “message” in error objects should map to the error strings in the 
> existing spec
> • The “data” in error objects should map to something similar to that in 
> the existing spec.
> • Notably, the session id will be required
> • I imagine command ids will need to increment, though not necessarily 
> monotonically
> • The section about targeting other contexts (eg. webworkers) with the 
> current protocol implies that new functionality will also be available 
> that way
> • Why not have the additional functionality be present only in the new 
> version?
> • I’d suggest keeping the concept of the “default context”
> • Sending commands to an element would imply having switched to that 
> context already
> • Receiving messages from an element implies that the message contains 
> both the element id and the context id.
> • Once upgraded to bidi, do we want the original end points to continue 
> working? Or is upgrading a “one way” operation, and all subsequent 
> commands are expected to be sent via the bidi protocol?
> • I'd prefer to allow both, but this could lead to races if command 
> processing is not serialized as it is now.
> • We need to understand how to communicate multiple error messages in 
> one response (to maintain strict JSON-RPC compat, only one response 
> message expected per command).
> • Web Inspector protocol may already deviates from this and allows 
> multiple errors (or perhaps they are pasted together into one payload)
> • Sending ‘jsonrpc’: “2.0” in every command seems like a waste. Can it 
> be dropped, if we’re already thinking of bending the JSON-RPC protocol…
> 
> Simon
> 
> 
>> On 14 Jan 2020, at 22:35, John Jansen <John.Jansen@microsoft.com 
>> <mailto:John.Jansen@microsoft.com>> wrote:
>>
>> Hey all,
>>
>> We are thinking about talking about this at BlinkOn, and would love 
>> feedback (even a "this is crazy!!" or a "LGTM").
>>
>> If you have a minute to check it out, please let us know what you 
>> think by posting an issue in the github repo.
>>
>> Thanks!
>> -John
>>
>> Sent fromOutlook <http://aka.ms/weboutlook>
>> ------------------------------------------------------------------------
>> *From:*Brandon Walderman <brwalder@microsoft.com 
>> <mailto:brwalder@microsoft.com>>
>> *Sent:*Monday, December 9, 2019 2:38 PM
>> *To:*public-browser-tools-testing@w3.org 
>> <mailto:public-browser-tools-testing@w3.org> 
>> <public-browser-tools-testing@w3.org 
>> <mailto:public-browser-tools-testing@w3.org>>
>> *Subject:*RE: Interested in writing up an initial design for 
>> bidirectional WebDriver protocol
>> Hi folks,
>>
>> I’ve published a pair of explainer documents to the MSEdgeExplainers 
>> repo (link below). The first document, “webdriver.md” outlines our 
>> team’s concept for a bidirectional WebDriver protocol and mostly 
>> explores how the existing WebDriver feature set might look in a 
>> bidirectional world. The next document, “bootstrap-scripts.md” takes a 
>> new feature that we touched on at TPAC 2019 and illustrates how that 
>> feature might work using the protocol outlined in “webdriver.md”. 
>> There’s also some protocol documentation (work-in-progress) alongside 
>> these explainers. I’m looking forward to hearing everyone’s thoughts.
>>
>> https://github.com/MicrosoftEdge/MSEdgeExplainers/tree/master/WebDriverRPC 
>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoftEdge%2FMSEdgeExplainers%2Ftree%2Fmaster%2FWebDriverRPC&data=02%7C01%7CJohn.Jansen%40microsoft.com%7C1010804a7d6f468a4db408d77cf88e1c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637115279290818367&sdata=AFKFpa8%2Bq5BD4Q6ZFFimIIXs6Tizga8s8CuI%2BKv%2B7QA%3D&reserved=0>
>>
>> Thanks,
>> Brandon W.
>>
>> *From:*Brandon Walderman <brwalder@microsoft.com 
>> <mailto:brwalder@microsoft.com>>
>> *Sent:*Friday, October 18, 2019 2:00 PM
>> *To:*public-browser-tools-testing@w3.org 
>> <mailto:public-browser-tools-testing@w3.org>
>> *Subject:*Interested in writing up an initial design for bidirectional 
>> WebDriver protocol
>>
>> Hi folks,
>>
>> I should have some spare cycles in the near future, and I'd like to 
>> take a stab at an initial design for the bidirectional WebDriver 
>> protocol. The purpose would be to get the ball rolling and get some 
>> early feedback on one possible approach. Would anyone mind if I go 
>> ahead with this? It would be in the form of an explainer written in 
>> markdown or google doc for now and I'd share it out as soon as it's 
>> ready to look at. The idea is not to write a spec draft yet but to put 
>> some thoughts on paper and see what others think. How does that sound?
>>
>> Thanks,
>> Brandon W.
> 

Received on Tuesday, 14 April 2020 18:15:41 UTC