Re: URL parsing

On Wed, Apr 28, 2010 at 8:54 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 28.04.2010 17:40, Adam Barth wrote:
>> ...
>> Oh, as I said above, this is "raw data."  The "expected" results are
>> just what the author of url_canon_unittest.cc thought the results
>> should be.  This data is purely an empirical measurement of what
>> browsers actually do.
>> ...
>
> OK, thanks for the clarification.
>
> The reason why I'm asking is because I did some tests with the URL
> decomposition attributes a few months ago
> (<http://greenbytes.de/tech/webdav/urldecomp.html>), and found that what
> HTML5 describes (or used to describe) didn't seem to be widely implemented.
> Which lead me to question whether we actually *have* to specify a certain
> behavior, for instance for broken URIs (such as with a single trailing %).

I haven't tested URL decomposition yet, but I'll try to remember to
incorporate your test cases when I do.

>> In the case you mention, my recollection is that 3 out of 4 browsers
>> agree that you should lowercase the scheme.  Based on that evidence,
>> I'd probably recommend that the wayward browser also lowercase the
>> scheme.  However, I've haven't looked into these issues in enough
>> detail to know if there are other considerations that might cause us
>> to prefer that browsers not lowercase the scheme.
>
> As far as I understand, HTML5 used to require that no normalization takes
> place (essentially, it was requiring to slice the ... web address ... into
> components, and to return them unmodified). I'm not convinced that there's
> any code out there relying on this...

I don't have any data to share on that question at this time.

Adam

Received on Wednesday, 28 April 2010 16:12:20 UTC