W3C home > Mailing lists > Public > public-html-comments@w3.org > November 2011

Re: URL parsing in HTML5

From: Anne van Kesteren <annevk@opera.com>
Date: Fri, 04 Nov 2011 08:58:32 -0700
To: "Julian Reschke" <julian.reschke@gmx.de>
Cc: "public-iri@w3.org" <public-iri@w3.org>, public-html-comments@w3.org, "Peter Saint-Andre" <stpeter@stpeter.im>, "Sam Ruby" <rubys@intertwingly.net>, "Paul Cotton (pcotton@microsoft.com)" <pcotton@microsoft.com>, "Ian Hickson" <ian@hixie.ch>, "Michael(tm) Smith" <mike@w3.org>, "Adam Barth" <ietf@adambarth.com>, "Edward O'Connor" <ted@oconnor.cx>
Message-ID: <op.v4finuvc64w2qv@annevk-macbookpro.local>
On Fri, 04 Nov 2011 08:50:07 -0700, Julian Reschke <julian.reschke@gmx.de>  
wrote:
> On 2011-11-04 16:34, Anne van Kesteren wrote:
>> The outcome you sketch will also result in all other W3C specifications
>> to be implemented by browsers (and even HTTP if it were to be defined in
>> a non-fiction manner) depend on HTML for its definition of URL  
>> processing.
>
> Please stop the "fiction" rhetoric. There's also a lot of fiction in  
> HTML5 (such as requiring rewriting of \ for all URI schemes), and I  
> don't see you arguing about *that*.

I think it only rewrites it for a certain class of URL schemes, but the  
details of URL processing are besides the point here. The point is that  
URL processing should be uniform. What the exact details of URL processing  
should be is indeed not completely figured out just yet, but it is clear  
that the IETF specifications on the matter are fiction.


> I do agree that URIs leak, but that doesn't necessarily mean that we can  
> have the same processing requirements everywhere. For instance, there  
> are cases where whitespace acts as a delimiter and thus will not be  
> accepted as URI character, no matter how much you want it to.

You keep bringing this example up and I will remind you once again that  
obviously you would have to split on whitespace characters first in such  
cases. This has does not affect uniform URL processing in the slightest,  
it just means we should either require whitespace characters in URLs to  
always be escaped, or require whitespace characters in URLs to be escaped  
in cases where URLs are whitespace separated.


-- 
Anne van Kesteren
http://annevankesteren.nl/
Received on Friday, 4 November 2011 16:00:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 4 November 2011 16:00:09 GMT