W3C home > Mailing lists > Public > public-html@w3.org > April 2010

Re: URL parsing

From: Julian Reschke <julian.reschke@gmx.de>
Date: Wed, 28 Apr 2010 17:54:23 +0200
Message-ID: <4BD85A2F.9010503@gmx.de>
To: Adam Barth <w3c@adambarth.com>
CC: HTML WG <public-html@w3.org>, Larry Masinter <LMM@acm.org>
On 28.04.2010 17:40, Adam Barth wrote:
> ...
> Oh, as I said above, this is "raw data."  The "expected" results are
> just what the author of url_canon_unittest.cc thought the results
> should be.  This data is purely an empirical measurement of what
> browsers actually do.
> ...

OK, thanks for the clarification.

The reason why I'm asking is because I did some tests with the URL 
decomposition attributes a few months ago 
(<http://greenbytes.de/tech/webdav/urldecomp.html>), and found that what 
HTML5 describes (or used to describe) didn't seem to be widely 
implemented. Which lead me to question whether we actually *have* to 
specify a certain behavior, for instance for broken URIs (such as with a 
single trailing %).

> In the case you mention, my recollection is that 3 out of 4 browsers
> agree that you should lowercase the scheme.  Based on that evidence,
> I'd probably recommend that the wayward browser also lowercase the
> scheme.  However, I've haven't looked into these issues in enough
> detail to know if there are other considerations that might cause us
> to prefer that browsers not lowercase the scheme.

As far as I understand, HTML5 used to require that no normalization 
takes place (essentially, it was requiring to slice the ... web address 
... into components, and to return them unmodified). I'm not convinced 
that there's any code out there relying on this...

Best regards, Julian
Received on Wednesday, 28 April 2010 15:55:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:08 GMT