Re: URL work in HTML 5 from Julian Reschke on 2012-09-25 (www-tag@w3.org from September 2012)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Tue, 25 Sep 2012 10:49:39 +0200
To: Robin Berjon <robin@w3.org>
CC: Larry Masinter <masinter@adobe.com>, W3C TAG <www-tag@w3.org>
Message-ID: <50617023.6080402@gmx.de>

On 2012-09-25 10:31, Robin Berjon wrote:
> On 25/09/2012 04:29 , Larry Masinter wrote:
>> I think there was a group willing to consider the redefinition of
>> URLs in HTML5 as a local anomaly within HTML, in a way that didn’t
>> really affect any other format or application.
>
> My understanding is that Anne is working on an improved definition of
> URLs because he noticed demonstrable severe interoperability issues with
> tasks as deceivingly simple as parsing URLs.
>
> Has anyone in this thread taken if only five minutes to perhaps peruse
> the evidence and see if he might not have a point? I ask because I've
> given it a cursory look and what I've seen is ugly.

Of course there is a point. The specs (RFCs 3986 and 3987) do not define 
how to treat broken identifiers. Furthermore, references in HTML 
definitively do require preprocessing (such as dropping leading 
whitespace, or potentially rewriting query parts when not in UTF-8) 
before they can be handled as URIs/IRIs.

This is not a new discussion.

I believe that this can be best handled by acknowledging that what HTML 
uses are identifiers that need some level of sanitization before they 
can be treated as URI/IRI (references).

It appears that Anne's approach is to pretend that the RFCs are broken 
and need to be completely replaced. This of course ignores that fact 
that they are widely implemented outside browsers.

What we IMHO need is a *precise* problem statement, and then a mapping 
layer.

Also, it's not helpful that terminology from 3986 ("resolve") is used 
for something else, leading even to more confusion.

> ...

Best regards, Julian

Received on Tuesday, 25 September 2012 08:51:40 UTC