Re: URL parsing in HTML5 from Julian Reschke on 2011-11-04 (public-html-comments@w3.org from November 2011)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 04 Nov 2011 17:25:15 +0100
To: Anne van Kesteren <annevk@opera.com>
CC: "public-iri@w3.org" <public-iri@w3.org>, public-html-comments@w3.org, Peter Saint-Andre <stpeter@stpeter.im>, Sam Ruby <rubys@intertwingly.net>, "Paul Cotton (pcotton@microsoft.com)" <pcotton@microsoft.com>, Ian Hickson <ian@hixie.ch>, "Michael(tm) Smith" <mike@w3.org>, Adam Barth <ietf@adambarth.com>, Edward O'Connor <ted@oconnor.cx>
Message-ID: <4EB411EB.4010000@gmx.de>

On 2011-11-04 16:58, Anne van Kesteren wrote:
> On Fri, 04 Nov 2011 08:50:07 -0700, Julian Reschke
> <julian.reschke@gmx.de> wrote:
>> On 2011-11-04 16:34, Anne van Kesteren wrote:
>>> The outcome you sketch will also result in all other W3C specifications
>>> to be implemented by browsers (and even HTTP if it were to be defined in
>>> a non-fiction manner) depend on HTML for its definition of URL
>>> processing.
>>
>> Please stop the "fiction" rhetoric. There's also a lot of fiction in
>> HTML5 (such as requiring rewriting of \ for all URI schemes), and I
>> don't see you arguing about *that*.
>
> I think it only rewrites it for a certain class of URL schemes, but the

...yes: "If result uses a scheme with a server-based naming authority..."

> details of URL processing are besides the point here. The point is that
> URL processing should be uniform. What the exact details of URL
> processing should be is indeed not completely figured out just yet, but
> it is clear that the IETF specifications on the matter are fiction.

Well, so is that the HTML spec says. The problem is to pretend that it's 
possible to agree on the same error handling for everybody.

We spent tons of emails on the IRI mailing list to figure out *which* 
"willful violations" of RFC 3986 UAs implementers agree on, and didn't 
really find a lot.

>> I do agree that URIs leak, but that doesn't necessarily mean that we
>> can have the same processing requirements everywhere. For instance,
>> there are cases where whitespace acts as a delimiter and thus will not
>> be accepted as URI character, no matter how much you want it to.
>
> You keep bringing this example up and I will remind you once again that
> obviously you would have to split on whitespace characters first in such
> cases. This has does not affect uniform URL processing in the slightest,
> it just means we should either require whitespace characters in URLs to
> always be escaped, or require whitespace characters in URLs to be
> escaped in cases where URLs are whitespace separated.

It means that you have at least *two* processing algorithms, no matter 
how you rephrase it .-)

Best regards, Julian

Received on Friday, 4 November 2011 16:25:54 UTC