RE: URL parsing from Larry Masinter on 2010-04-28 (public-html@w3.org from April 2010)

From: Larry Masinter <LMM@acm.org>
Date: Wed, 28 Apr 2010 09:28:46 -0700
To: "'Adam Barth'" <w3c@adambarth.com>, "'Julian Reschke'" <julian.reschke@gmx.de>
Cc: "'HTML WG'" <public-html@w3.org>
Message-ID: <005101cae6ef$e16782f0$a43688d0$@org>

Could you please move this to (or at least cc) the public-iri@w3.org
list?

Thanks,

Larry

-----Original Message-----
From: Adam Barth [mailto:w3c@adambarth.com] 
Sent: Wednesday, April 28, 2010 9:11 AM
To: Julian Reschke
Cc: HTML WG; Larry Masinter
Subject: Re: URL parsing

On Wed, Apr 28, 2010 at 8:54 AM, Julian Reschke
<julian.reschke@gmx.de> wrote:
> On 28.04.2010 17:40, Adam Barth wrote:
>> ...
>> Oh, as I said above, this is "raw data."  The "expected" results
are
>> just what the author of url_canon_unittest.cc thought the results
>> should be.  This data is purely an empirical measurement of what
>> browsers actually do.
>> ...
>
> OK, thanks for the clarification.
>
> The reason why I'm asking is because I did some tests with the URL
> decomposition attributes a few months ago
> (<http://greenbytes.de/tech/webdav/urldecomp.html>), and found that
what
> HTML5 describes (or used to describe) didn't seem to be widely
implemented.
> Which lead me to question whether we actually *have* to specify a
certain
> behavior, for instance for broken URIs (such as with a single
trailing %).

I haven't tested URL decomposition yet, but I'll try to remember to
incorporate your test cases when I do.

>> In the case you mention, my recollection is that 3 out of 4
browsers
>> agree that you should lowercase the scheme.  Based on that
evidence,
>> I'd probably recommend that the wayward browser also lowercase the
>> scheme.  However, I've haven't looked into these issues in enough
>> detail to know if there are other considerations that might cause
us
>> to prefer that browsers not lowercase the scheme.
>
> As far as I understand, HTML5 used to require that no normalization
takes
> place (essentially, it was requiring to slice the ... web address
... into
> components, and to return them unmodified). I'm not convinced that
there's
> any code out there relying on this...

I don't have any data to share on that question at this time.

Adam

Received on Wednesday, 28 April 2010 16:29:27 UTC