Re: how browsers transform URLs from Erik van der Poel on 2009-11-26 (uri@w3.org from November 2009)

From: Erik van der Poel <erikv@google.com>
Date: Wed, 25 Nov 2009 22:28:52 -0800
To: Mike Brown <mike@skew.org>
Cc: uri@w3.org
Message-ID: <c07a32650911252228s60030725w66694fc4aaca68ad@mail.gmail.com>

Mike,

Thanks for the email. You're absolutely right that the recommendations
document is missing detailed steps regarding (base + relative ->
absolute). We definitely need to do more tests with the JavaScript
interfaces (this.href, this.getAttribute('href' [, n])) since
Microsoft's documentation on getAttribute mentions absolute and
relative URLs. Until now, most of our testing has been focussed on the
DNS and HTTP packets rather than the JavaScript calls. Also, we don't
have any relative URL tests yet (though I have already found that the
browsers handle the <base> tag with a relative URL inside it
differently).

I'm not sure what you mean by "I was surprised that IE simply ignores
a lone "%" and that it passes through control characters in query
strings." Are you talking about a lone % in the path part of the URL?
(In the query part, most of the browsers simply allow a lone %.)

If you are talking about a lone % in the path part, I have to admit
that I am also surprised about IE's behavior. (What was their reason
for rejecting such URLs?)

Erik

On Wed, Nov 25, 2009 at 5:44 PM, Mike Brown <mike@skew.org> wrote:
> Erik van der Poel wrote:
>> We are happy to announce the open source release of Client URL
>> Internet Emission Sniffer (CURLIES).
>
> Interesting project. Just a couple of cursory observations:
>
> In the recommendations for brower developers, section 6 assumes the URL in the
> href or whatever is absolute. It's really a URI reference, and may relative.
> In that situation, it's relative to the HTML document's URI, which is usually
> found external to the document but can be set elsewhere; see RFC 3986 section
> 5. Regardless of whether it's relative or absolute, there's a procedure for
> converting it to an absolute one (one definition of 'resolution'), and this
> needs to be accounted for in your procedures. For example, it seems some
> cleanup of poorly written URI references (whitespace & bad characters) needs
> to be done, then the resolving to absolute form needs to happen, and only then
> can you start identifying and processing the URI's discrete components (host,
> path, params, whatever).
>
> Anyway, the ASCII test results for path and query were interesting. I'm not
> surprised that "%00" and \x00 were funky, and although it wasn't what I
> expected, I can understand why "\" would be converted to "/" by most browsers
> (except in Firefox, where it's "%5C"). I was surprised that IE simply ignores
> a lone "%" and that it passes through control characters in query strings.
>
> I'm looking forward to seeing what evolves in the "Handle the Rest" section of
> the recommendations :)
>
> Mike
>

Received on Thursday, 26 November 2009 06:29:26 UTC