Re: how browsers transform URLs

Erik van der Poel wrote:
> We are happy to announce the open source release of Client URL
> Internet Emission Sniffer (CURLIES).

Interesting project. Just a couple of cursory observations:

In the recommendations for brower developers, section 6 assumes the URL in the 
href or whatever is absolute. It's really a URI reference, and may relative. 
In that situation, it's relative to the HTML document's URI, which is usually 
found external to the document but can be set elsewhere; see RFC 3986 section 
5. Regardless of whether it's relative or absolute, there's a procedure for 
converting it to an absolute one (one definition of 'resolution'), and this 
needs to be accounted for in your procedures. For example, it seems some 
cleanup of poorly written URI references (whitespace & bad characters) needs 
to be done, then the resolving to absolute form needs to happen, and only then 
can you start identifying and processing the URI's discrete components (host, 
path, params, whatever).

Anyway, the ASCII test results for path and query were interesting. I'm not 
surprised that "%00" and \x00 were funky, and although it wasn't what I 
expected, I can understand why "\" would be converted to "/" by most browsers 
(except in Firefox, where it's "%5C"). I was surprised that IE simply ignores 
a lone "%" and that it passes through control characters in query strings.

I'm looking forward to seeing what evolves in the "Handle the Rest" section of 
the recommendations :)

Mike

Received on Thursday, 26 November 2009 01:45:34 UTC