Re: The ever contentious capabilities for new sessions

Hi James,

Thanks for the feedback. Comments inline….

On Tue, Sep 13, 2016 at 10:14 PM, James Graham <>

> On 12/09/16 21:55, Simon Stewart wrote:
>> Hi,
>> We spend an awful lot of time at F2F sessions on capabilities, but I
>> think Jason Leyba's current suggestion nails almost all the points that
>> have been raised in meetings and in person:
>> Notably:
>>   * Address the desire for simple processing of capabilities by end
>>     nodes, with exact matches only.
>>   * Makes it possible to describe several different possibilities.
>>   * Has a different set of blob keys, meaning that the protocol
>>     handshake between OSS, the original spec text, and the new spec text
>>     can be done unambiguously (esp. if end nodes hold to Postel's Law)
>>   * Makes an effort to reduce data being sent across the wire through
>>     the use of the "required capabilities" being merged with "first
>> match".
>> The biggest downside from my point of view is that this is hard to make
>> 100% backward compatible with the widespread use of selenium, but we
>> could handle iterating over the values on the local end until we ship
>> Selenium 4.
> I think this proposal looks like an improvement over the existing design.
> However I have a number of concerns:
> * I think continuing to describe these as "capabilities" is misleading
> because the name implies semantics that are only relevant to a subset of
> the features (particular around browser selection and routing). Things like
> timeouts are pure configuration. We should use a more neutral term like
> "parameters".

They're called Capabilities in all the local ends that exist, and changing
the name here causes an interesting mental dissonance. Changing the local
ends is wildly impractical because of their widespread usage (for example,
we still find people using the old RC APIs even though they've been
encouraged to move away from them for about five years)

More relevantly, the blob being sent back _does_ describe the capabilities
of the browser that has been supplied. It's possible to do feature-sniffing
from them and dynamically add interfaces to allow users to access that
functionality (for example, the Augmenter class does this in the selenium

I do agree that there is a mixing of capabilities we need from the end node
(such as being "rotatable" or "cssSelectorsEnabled") with configuration
values. I think the simplicity of shovelling all the values into a single
bucket is an important consideration.

> * I presume the point of passing on browser-selection parameters to the
> browser itself is to enable the browser to re-match on these parameters to
> select only the required subset of parameters, without requiring an
> intermediary node to alter the message. But I think the design here has two
> issues. One is that it is not, in general, cheap to tell which version of a
> browser will be run; to do this from a proxy one needs to actually launch
> the browser and parse out the version number string. This seems relatively
> complicated and I would like to avoid it if possible.

Given that we have a remote protocol, and that end nodes are meant to
understand this protocol, there is no requirement (and a strong
possibility) that routing nodes will not be on the same machine as the end
node --- version checks can only really be done by the end node. Figuring
out which version is installed on a machine might be far cheaper than
starting the browser itself: surely the version number of the installed
version of Edge is a lookup in the registry? Safari certainly stores its
version in a plist, which is easy to parse and read quickly. A quick look
at the version of Firefox on my Mac shows that the version number is
encoded in the "application.ini" in three places.

> Why do routing intermediaries need special consideration in terms of not
> altering the parameters?

To keep them cheap and easy to implement. Intermediary nodes such as Grid
are already under memory pressure from multiple incoming requests, and
adding additional processing load isn't helpful. In the ideal world, they
could just be a dumb pipe, using the session id from the URL to route
requests to the right place.

> The other is that the proposed structure seems rather non-general. If I
> want to specify something large like a bas64-encoded profile in a way that
> it only appears in the message once, but where it applies to > 1 but not
> all of the firstMatch parameter sets that isn't possible.

By design. Jason and I discussed this on the #selenium IRC channel. In the
case of Firefox profiles, the common case I've seen is to ensure that a
number of extensions are pre-installed, and those extensions support all
requested versions of Firefox.

> * It is unclear to me that hard-failing on unrecognised parameters is the
> most backwards compatible thing. In particular I'm wondering about the case
> where a browser introduces a new parameter related to something like e.g.
> logging which is basically always optional. In the scheme described that
> would require duplication for no obvious benefit. Having said that there is
> no recursive validation, so it seems one could always put browser-specific
> configuration under a single parameter and implement whatever semantics
> inside that, without violating the letter of the spec.

Agreed. Some capabilities/parameters are required, and some are desired,
even when matching :) Having said this, it's impossible for an end node to
tell whether a requested capability is one that could be considered
optional (eg. logging) or is actually mandatory (eg. supporting some level
of pointer events)[1]

> * Some details of the way the spec is set up don't make sense. This is a
> holdover from the existing text, but if we are revamping this section we
> should also fix the major structural issues e.g. the table that has
> normative text that is not actually referenced from any section.

We can iterate on this once the initial PR has landed, but I think we're
both agreeing that this makes a great starting point for a productive

> * It may just be that I'm bad at reading the spec as a diff, but it's not
> clear that the algorithm as written actually does the right thing. It seems
> like every option in matchFirst is tried and the value of the last is used,
> irrespective of anything. Apologies if I'm horribly misreading this.

That should be fixed.

> So I think a design similar to this that I would prefer is:
> {
> "routing": [
>   {"browser": "firefox",
>    "platform": "linux"},
>   {"browser": "firefox"},
>   {"browser": "chrome"},
>   {}
> ],
> "settings": [
>     {"timeouts": {"script": 30000},
>     {"match": {"browser": "firefox", "version": 49},
>      "firefoxOptions": {"prefs": {"dom.disable-open-during-load":false}}
>     },
>     {"match": {"browser": "firefox"},
>      "profile": <base64String>
>     },
>     {"match": {"browser": "chrome"},
>      "binary": "/usr/local/chrome"}
>   ]
> }
> For any intermediary that did routing this would express a preference for
> Firefox on Linux, followed by Firefox on any platform, followed by Chrome,
> followed by anything.

The problem is that _any_ value in the capabilities could theoretically be
used for routing --- perhaps I want a browser that's rotatable. Admittedly,
public services such as Sauce Labs already only switch on browser and
platform name, but that's the existing implementations, and may not be true
in the future. The testing infrastructure we had at Google meant that there
was no need for any browser, platform, or version numbers of either to be
included --- by the time the test ran, we already knew exactly what the
user was getting.

> For browser settings, the options that match would be cumulative, so any
> browser would set the script timeout to 30s, any firefox would use the same
> base profile, firefox 49 would set a specific pref, and Chrome would use a
> specific binary. For matching with the version number one would have to use
> the specified binary, if any so e.g.
> {
> "settings": [
>   {"timeouts": {"script": 30000}},
>   {"match": {"browser": "firefox", "version": 49},
>              "binary": "/home/user/firefox"}
>   ]
> }
> running in firefox would only use the /home/user/firefox binary if that
> binary was a firefox 49 binary (if intermediary nodes could be required to
> edit the new session payload we could sidestep this complexity by requiring
> that they reduce the settings to a list of length 1 with no match clauses
> representing only the things that are known to work. But that does have the
> problem that one can't send the same message irrespective of whether a
> routing intermediary is present).

This cumulative thing seems more complex than Jason's suggestion, and would
mean that each browser would need to prefix browser-specific options with
vendor prefixes. For example, "profile" strikes me as something that would
fall into this --- imagine asking for a "firefox with this profile" and if
that wasn't available "chrome". With your suggestion (if I understand it
properly), we'd attempt to start chrome with a firefox profile. Clearly far
from ideal….



[1] All examples purely hypothetical, and used for illustration

Received on Wednesday, 14 September 2016 14:39:31 UTC