Re: The ever contentious capabilities for new sessions from Simon Stewart on 2016-09-18 (public-browser-tools-testing@w3.org from July to September 2016)

From: Simon Stewart <simon.m.stewart@gmail.com>
Date: Sun, 18 Sep 2016 21:32:30 +0100
To: James Graham <james@hoppipolla.co.uk>
Cc: public-browser-tools-testing@w3.org
Message-Id: <5E4C80F3-5A71-49DE-A3F6-9431002CCB48@gmail.com>
Inline, and pecked out on my phone. My apologies for lack of brevity and weird autocorrects. 



Sent from my iPhone
> On 14 Sep 2016, at 16:21, James Graham <james@hoppipolla.co.uk> wrote:
> 
>> On 14/09/16 15:39, Simon Stewart wrote:
>> 
>>    For browser settings, the options that match would be cumulative, so
>>    any browser would set the script timeout to 30s, any firefox would
>>    use the same base profile, firefox 49 would set a specific pref, and
>>    Chrome would use a specific binary. For matching with the version
>>    number one would have to use the specified binary, if any so e.g.
>> 
>>    {
>>    "settings": [
>>      {"timeouts": {"script": 30000}},
>>      {"match": {"browser": "firefox", "version": 49},
>>                 "binary": "/home/user/firefox"}
>>      ]
>>    }
>> 
>>    running in firefox would only use the /home/user/firefox binary if
>>    that binary was a firefox 49 binary (if intermediary nodes could be
>>    required to edit the new session payload we could sidestep this
>>    complexity by requiring that they reduce the settings to a list of
>>    length 1 with no match clauses representing only the things that are
>>    known to work. But that does have the problem that one can't send
>>    the same message irrespective of whether a routing intermediary is
>>    present).
>> 
>> 
>> This cumulative thing seems more complex than Jason's suggestion, and
>> would mean that each browser would need to prefix browser-specific
>> options with vendor prefixes. For example, "profile" strikes me as
>> something that would fall into this --- imagine asking for a "firefox
>> with this profile" and if that wasn't available "chrome". With your
>> suggestion (if I understand it properly), we'd attempt to start chrome
>> with a firefox profile. Clearly far from ideal….
> 
> I don't think I can have explained it properly because that certainly wasn't the intent.
> 
> The major design goal here is to separate out as much as possible two largely orthogonal things. One is routing a request to a specific end node, the other is setting up that end node according to what's needed for the tests.

The problem is that any capability can also be a key for routing a new session.

Worse, although an intermediary node may believe that "win10-chrome-latest" is a VM running latest chrome on Windows 10, there's no guarantee, so the end node would be required to check those capabilities anyway. 

> Routers only look at the "routing" key (ignoring any other top level key they require to perform additional functions like authentication). This contains an ordered list of objects describing the preferred end node type to select. These object could in principle contain anything that the router could select on so if some intermediary wanted to add, say, a "platformType" key with values "desktop", "mobile", "tablet" and "tv screen in Times Square", that would be fine. However it seems like the spec is going to specifically mention four common settings around browser, platform and versions of those.

Right. We're deliberately minimising anything that's not related to the browser that's running. 

> The other part of the proposal is aimed at end nodes only. End nodes only look at the "settings" key. This is a list consisting of objects that may contain a property called "match". If "match" is missing or null the remaining properties on the object are applied as settings irrespective of the properties of the end node. Otherwise the user agent evaluates the properties of the "match" object against the properties of the browser that would be launched given that all previously matched settings plus the current set of settings are applied (this complexity matters mostly because of version numbers and the ability to provide custom binary paths). If those properties are a match for the current browser the settings described in the rest of the object will be applied. This means that you never get settings for chrome and firefox mixed up, for example.
> 
> I think this part is easier to understand with an illustration:
> 
> Imagine I send
> 
> {"settings": [
>  {"timeouts": {"script": 30000}},
>  {"match": {"browser":"firefox"}
>   "profile": "<someFirefoxProfile>"},
>  {"match": {"browser": "chrome"},
>   "profile": "<someChromeProfile>"},
>  {"match": {"browser": "firefox", "platform": "linux",
>             "browserVersion": 47},
>   "binary": "/usr/bin/firefox",
>   "prefs": {"dom.serviceworker.enabled": true}
> }
> ]}
> 
> Let's say geckodriver gets the above message so the browser that will be launched is Firefox and that we know we are running on Linux.
> 
> So we go through the list in order:
>  - {"timeouts": {"script": 30000}} has no "match" clause so it always applies. Our final settings object is set to
> {"timeouts": {"script": 30000}}.
> 
> - {"match": {"browser":"firefox"}
>    "profile": "<someFirefoxProfile>"} has a "match" clause on the browser name which will match, so we also add these settings. The final settings object becomes
> {"timeouts": {"script": 30000},
>  "profile": "<someFirefoxProfile>"}.
> 
> - {"match": {"browser": "chrome"},
>    "profile": "<someChromeProfile>"} has a similar match clause to the previous one, but this time doesn't result in a match, so the final settings object is unchanged.
> 
> 
> - {"match": {"browser": "firefox", "platform": "linux",
>              "browserVersion": 47},
>   "binary": "/usr/bin/firefox",
>   "prefs": {"dom.serviceworker.enabled": true} is the complex case. Browser and platform match, but version is more complex. In this case we have to form a candidate settings object:
>  {"timeouts": {"script": 30000},
>   "profile": "<someFirefoxProfile>"
>   "binary": "/usr/bin/firefox",
>   "prefs": {"dom.serviceworker.enabled": true}
> Now we inspect this binary and check if it is a Firefox 47 binary. In the case that it is the final settings object is the candidate settings object, otherwise it remains unchanged from the step before.
> 
> The settings we use for configuring the remote end are then those described by the final settings object.
> 
> I think that this isn't actually any more complex than the other model. In particular I have made a point of demonstrating the hardest possible cases that other models don't handle well e.g. routing on one set of properties by configuring the end nodes based on a different set of properties. But in the case where you already know what kind of end node you will get you simply send
> 
> {
> "settings": [
>    {"some setting": "some value"}
>  ]
> }
> 
> (one can imagine a microoptimisation to allow the list to be omitted in the case that there is exactly one entry).

Hmm… I think I'm starting to get a better understanding of your underlying discomfort. LMK if I'm wildly off-base :)

As it stands, the capabilities blob is a combination of configuration settings used by the browser at run time (timeouts, proxies), settings that are used when starting the browser (profiles, proxies), and information that is most commonly, but not exclusively, used by systems such as Grid, SauceLabs, BrowserStack, and so on for selecting a VM or other intermediary node (browser name). That blob may also contain anything in the spec that has been given a capability name.

You're suggesting changing this out to a set of matchers and setting objects that may or may not apply. 

If this is so, I think the conversation about "routing" isn't helping as that's the thing we're focusing on. 

As it stands in OSS webdriver, we have a "required" capability object and a "desired" capability object (we only ever used to have a single "desired" one). The idea with this design was that remote end would do a "best effort" to match the capabilities, and the local end would validate that enough had been met. 

Earlier similar systems had separated out querying for the availability of the resource from actually using that resource, which was an inherently "racey" design that failed in pretty obvious ways. To minimise network requests, OSS webdriver chose a model where resource acquisition is initialisation, which meant we also needed all configuration data in the same request. Since failing to match one of these configuration options was also a potential error, they could comfortably be added to the capabilities blob. 

The problem with the first version of this design was that all validation had to be done on the local end. It's more efficient in a distributed system to push the validation to be as close to the end node as possible, which is why we added "required" properties. 

The PR that we're currently considering allows us to push all of the validation to the end node, which is the only node in the system that actually knows which capabilities have been met, and which configuration objects have been used, including OS and browser versions and names. This is a highly desirable property, mainly because all the validation required can be done on the end node. 

There a number of advantages to the proposed design: 

1. it avoids smearing responsibility for validation throughout the system, making local and intermediary nodes simpler. 

2. No ambiguity: either all capabilities match or there's no match at all. 

3. Easy to implement: it's a for loop with a simple merge

4: relatively compatible with existing local ends. 

To my eyes, your suggestion reduces clarity in the new session payload while still possibly holding point 2 true, but then definitely doesn't meet point 4. As someone working on intermediary nodes, I don't think 3 is as easy with your suggestion. Provided we don't want nodes deleting data, 1 is equally true for both proposals. 

Let me know if I've understood your idea correctly. 

Simon
Received on Sunday, 18 September 2016 20:33:03 UTC