[whatwg] URL decomposition on HTMLAnchorElement interface

Kartikaya Gupta wrote:
> I was trying different things to see what happens and came across some particularly weird behavior in Gecko/2009021910 Firefox/3.0.7:
 >
 > var a = document.createElement('a');
 > a.setAttribute('href', 'http://example.org:123/foo?bar#baz');
 > a.hostname = null;
 > alert(a.hostname);       // displays "foo"
 > alert(a.href);           // displays "http://foo/?bar#baz"

Indeed.  The behavior you're seeing is due setting the hostname to the 
empty string, basically...  That said, this code should probably bail 
out when that happens instead of pressing on.  I've filed 
https://bugzilla.mozilla.org/show_bug.cgi?id=485562 on this.

Interestingly, it looks like Opera doesn't support the hostname setter 
at all.  Safari ignores the call in this case.  I don't have IE to test 
offhand.


> a.setAttribute('href', 'scheme://host/path');
> a.host = null;
> alert(a.host);           // displays ""
> alert(a.pathname);       // displays ""
> alert(a.href);           // displays "scheme:////host/path"

This case is more fun.  It's an unknown scheme, so it's assumed to be a 
no-authority non-hierarchical scheme and the URI is parsed that way. 
This does cause issues, since RFC 3986 says that i there is no authority 
then the path cannot begin with two slashes (so if "scheme" is a 
non-authority protocol then the URI is invalid, in fact).  But deciding 
whether this is an invalid URI or not involves knowing something about 
the "scheme" protocol, which is rather hard in this case, since you just 
made it up.  ;)

In general, parsing a URI for a scheme you know nothing about is a huge 
pain, especially if your URL parser is expected to do fixup on invalid 
URIs (which the parser for the "href" attribute of <a> is certainly 
expected to do).

-Boris

Received on Friday, 27 March 2009 11:14:35 UTC