Re: URL parsing from Adam Barth on 2010-04-28 (public-html@w3.org from April 2010)

From: Adam Barth <w3c@adambarth.com>
Date: Wed, 28 Apr 2010 09:11:18 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: HTML WG <public-html@w3.org>, Larry Masinter <LMM@acm.org>
Message-ID: <s2u5c4444771004280911v53239eb8rd0e060d657b40f95@mail.gmail.com>

On Wed, Apr 28, 2010 at 8:54 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 28.04.2010 17:40, Adam Barth wrote:
>> ...
>> Oh, as I said above, this is "raw data."  The "expected" results are
>> just what the author of url_canon_unittest.cc thought the results
>> should be.  This data is purely an empirical measurement of what
>> browsers actually do.
>> ...
>
> OK, thanks for the clarification.
>
> The reason why I'm asking is because I did some tests with the URL
> decomposition attributes a few months ago
> (<http://greenbytes.de/tech/webdav/urldecomp.html>), and found that what
> HTML5 describes (or used to describe) didn't seem to be widely implemented.
> Which lead me to question whether we actually *have* to specify a certain
> behavior, for instance for broken URIs (such as with a single trailing %).

I haven't tested URL decomposition yet, but I'll try to remember to
incorporate your test cases when I do.

>> In the case you mention, my recollection is that 3 out of 4 browsers
>> agree that you should lowercase the scheme.  Based on that evidence,
>> I'd probably recommend that the wayward browser also lowercase the
>> scheme.  However, I've haven't looked into these issues in enough
>> detail to know if there are other considerations that might cause us
>> to prefer that browsers not lowercase the scheme.
>
> As far as I understand, HTML5 used to require that no normalization takes
> place (essentially, it was requiring to slice the ... web address ... into
> components, and to return them unmodified). I'm not convinced that there's
> any code out there relying on this...

I don't have any data to share on that question at this time.

Adam

Received on Wednesday, 28 April 2010 16:12:20 UTC