Re: Go command and invalid URLs

On Fri, 6 May 2016, at 13:39, James Graham wrote:
> In the spec currently it is unclear how URLs are parsed/resolved with 
> the Go command[1]. It is not quite clear from the HTML Navigate 
> algorithm what happens when you pass it something other than an absolute 
> URL, yet we just blindly pass it the user-supplied string. Therefore I 
> conclude that we are responsible for ensuring that we pass it something 
> that is an absolute URL. There are two cases we have to consider:

For the benefit of those not familiar with what existing Selenium
implementations do, FirefoxDriver currently returns an error when passed
a malformed URL but does not support relative URL navigation.
ChromeDriver, I think, accepts both and falls back to the fuzzy matching
of the address bar when it can’t do anything useful with the URL.

It’s useful to keep this in mind. There does not appear to be a clear
conformance situation in existing implementations.

> 1) Strings that cannot be parsed as any kind of URL e.g. "http://a]b" I 
> am of the opinion that we should return an error in these cases because 
> the probability that someone is trying to test invalid url handling is 
> much smaller than the probability that they have made a typo.

The quintessential idea of WebDriver is to try to emulate user
interaction with the browser. We are successful in this only to a
certain extent: There are obvious areas that cannot be standardised
because browsers do not have uniform behaviour.

An example of this is the Close Window command that ends the session
when the last window is closed. This is in reality what happens when you
close the last browser window on Linux and Windows, but not what happens
on Mac. In order to achieve some level of interoperability, WebDriver
decides to also end the session on Mac even though it is technically
possible to have the browser process continue running with no open
windows.

Ask it to navigate to a malformed URL is similar in nature. Many
browsers have so-called intelligent address bars that will do a web
search or perform another action entirely when you enter something that
is clearly not an URL. This is user agent specific behaviour we for
obvious reasons cannot specify.

What the content browser ends up navigating to is the _output_ of what
the intelligent address bar produces. This could be a URL to a search
engine with the input as an encoded parameter, or it could be something
entirely different: Firefox lets you define “keywords” so that for
example “b 1123506” will take you to the Bugzilla bug with the same
number or “w WebDriver” will take you to the Wikipedia entry by the same
name.

On this basis, I think I anything involving user agent specific steps as
an intermediary to the user’s input string and what the browser ends up
navigating to would move us dangerously close to browser chrome
automation, which is out of scope for WebDriver.

Our navigation model is an _approximation_ to using the address bar to
navigate; not actually using entering key by key into the address bar
itself. I also think there’s a future-proofing argument to this debate
that future browsers (think kiosks or experimental browser user
interfaces) may not actually have address bars.

Therefore I agree with what James is saying above, that I think we ought
to error when given a string that absolutely cannot be parsed as a valid
URL object (https://url.spec.whatwg.org/).

Do we need a new error for this? "invalid url" or "malformed url"?

> 2) Strings that are not absolute URLs, but could be relative urls e.g. 
> "foo" or "example.org". I am of the opinion that we should treat these 
> as relative URLs and resolve them relative to the current document, as 
> would happen if they were in links. This has the benefit of being a 
> useful feature (you can navigate to resources without always having to 
> construct an absolute URL in the client) and being simple to specify. It 
> does mean that people can't pass in schemeless URLs and expect them to 
> be converted into http[s] urls, or whatever, but a client could 
> implement that behaviour if it desired. In the case that a string can't 
> be treated as a relative URL e.g. because the scheme doesn't support 
> relative URLs (e.g. data:), I believe an error should be returned.

I generally like this idea. I think we should support navigation to URLs
relative to the current document’s origin, like <a> elements.

> [1] Side note: I think the name "Get" was better. It causes the browser 
> to GET a particular URL. Seems clear enough.

https://www.w3.org/Bugs/Public/show_bug.cgi?id=29520

Received on Monday, 9 May 2016 16:15:00 UTC