W3C home > Mailing lists > Public > public-html@w3.org > April 2010

Re: URL parsing

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 29 Apr 2010 15:23:57 +0200
Message-ID: <4BD9886D.7000005@gmx.de>
To: Jonas Sicking <jonas@sicking.cc>
CC: Adam Barth <w3c@adambarth.com>, HTML WG <public-html@w3.org>, Larry Masinter <LMM@acm.org>, "public-iri@w3.org" <public-iri@w3.org>
(re-ccing IRI mailing list; I think for the time being we really need to 
include both mailing lists)

On 28.04.2010 19:31, Jonas Sicking wrote:
> On Wed, Apr 28, 2010 at 8:54 AM, Julian Reschke<julian.reschke@gmx.de>  wrote:
>>> In the case you mention, my recollection is that 3 out of 4 browsers
>>> agree that you should lowercase the scheme.  Based on that evidence,
>>> I'd probably recommend that the wayward browser also lowercase the
>>> scheme.  However, I've haven't looked into these issues in enough
>>> detail to know if there are other considerations that might cause us
>>> to prefer that browsers not lowercase the scheme.
>>
>> As far as I understand, HTML5 used to require that no normalization takes
>> place (essentially, it was requiring to slice the ... web address ... into
>> components, and to return them unmodified). I'm not convinced that there's
>> any code out there relying on this...
>
> For what it's worth, whenever we end up defining this, I'm much more
> interested to see tests in relation to what the various browsers with
> substantial usage base do, than what the HTML5 spec said at some point
> in time.

Yes; thus thanks to Adam for starting work on this.

> IIRC Ian has acknowledged that the behavior that was defined by the
> HTML5 spec needed significant work and advised that it was possibly
> better to start from scratch than to base work on the HTML5 spec.

Indeed. On the other hand, what got us to where we are (with a new IRI 
WG) was the claim that significant changes are needed because lots of 
error handling was required to be "compatible with the web".

For instance, we've been told that a "single %" needs to be accepted and 
preserved while parsing. I just checked with IE8; it throws on accessing 
the DOM attributes, and doesn't allow navigation to the target. Testcase:

         <a href="http://localhost:8080/%">Click me.</a>

I suspect we'll find many more cases where there isn't interop for 
broken inputs (and personally I have no problem in just stating that and 
be done with it).

One other thing we'll have to keep in mind is that UAs may handle URIs 
may not be identical that what they expose in the DOM.

> So I wouldn't take the HTML5 spec not matching what browsers do as a
> sign that behavior might not matter. A better indicator is if browsers
> differ in behavior.
>
> Also, as usual, we're fine with changing our implementation in
> firefox, as long as there is data backing up that it's unlikely to
> break the web. Such data could be behavior of other browsers, or data
> based on significant numbers of web pages. And, as usual, there are no
> hard numbers for what constitutes "significant", it'll have to be a
> judgment call on a case by case basis.

Indeed.

Best regards, Julian
Received on Thursday, 29 April 2010 13:24:40 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:17 UTC