W3C home > Mailing lists > Public > public-html-comments@w3.org > November 2011

Re: URL parsing in HTML5

From: Anne van Kesteren <annevk@opera.com>
Date: Fri, 04 Nov 2011 09:35:04 -0700
To: "Julian Reschke" <julian.reschke@gmx.de>
Cc: "public-iri@w3.org" <public-iri@w3.org>, public-html-comments@w3.org, "Peter Saint-Andre" <stpeter@stpeter.im>, "Sam Ruby" <rubys@intertwingly.net>, "Paul Cotton (pcotton@microsoft.com)" <pcotton@microsoft.com>, "Ian Hickson" <ian@hixie.ch>, "Michael(tm) Smith" <mike@w3.org>, "Adam Barth" <ietf@adambarth.com>, "Edward O'Connor" <ted@oconnor.cx>
Message-ID: <op.v4fkcqgo64w2qv@annevk-macbookpro.local>
On Fri, 04 Nov 2011 09:25:15 -0700, Julian Reschke <julian.reschke@gmx.de>  
> On 2011-11-04 16:58, Anne van Kesteren wrote:
>> details of URL processing are besides the point here. The point is that
>> URL processing should be uniform. What the exact details of URL
>> processing should be is indeed not completely figured out just yet, but
>> it is clear that the IETF specifications on the matter are fiction.
> Well, so is that the HTML spec says. The problem is to pretend that it's  
> possible to agree on the same error handling for everybody.

We have crossed that bridge for much more complex problems, such as HTML  
parsing, so I think it should be doable.

> We spent tons of emails on the IRI mailing list to figure out *which*  
> "willful violations" of RFC 3986 UAs implementers agree on, and didn't  
> really find a lot.

That does not mean we do not want to converge.

>> You keep bringing this example up and I will remind you once again that
>> obviously you would have to split on whitespace characters first in such
>> cases. This has does not affect uniform URL processing in the slightest,
>> it just means we should either require whitespace characters in URLs to
>> always be escaped, or require whitespace characters in URLs to be
>> escaped in cases where URLs are whitespace separated.
> It means that you have at least *two* processing algorithms, no matter  
> how you rephrase it .-)

Yes, you need two algorithms because standalone URLs and whitespace  
separated URLs are distinct. What are you trying to say?

Anne van Kesteren
Received on Friday, 4 November 2011 16:36:19 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:26:28 UTC