Re: URI/IRI vs HTML-URL, was: Why Microsoft's authoritative=true won't work and is a bad idea from Robert J Burns on 2008-07-08 (public-html@w3.org from July 2008)

From: Robert J Burns <rob@robburns.com>
Date: Tue, 8 Jul 2008 14:48:03 +0300
To: Martin Duerst <duerst@it.aoyama.ac.jp>
Cc: Julian Reschke <julian.reschke@gmx.de>, Justin James <j_james@mindspring.com>, "'Ian Hickson'" <ian@hixie.ch>, "'Sam Ruby'" <rubys@us.ibm.com>, "'HTTP Working Group'" <ietf-http-wg@w3.org>, public-html@w3.org
Message-Id: <DEFCAB65-ED59-40EF-B5DD-87E6CC7AD62D@robburns.com>

On Jul 8, 2008, at 9:13 AM, Martin Duerst wrote:
>
> As long as query URIs are interpreted based on the encoding of the
> containing page, they will stay useless without that context. I.e.
> they cannot (without further pain) be put into bookmark lists, they
> cannot be sent in email, and so on. The only sensible way to make
> this possible is to do the same as for the path part, namely use
> UTF-8 for the IRI->URI conversion. Freestanding (U/I)RIs with
> query parts may be less important than freestanding (U/I)RIs
> without query parts, but still, they are often convenient.
> However, they won't work if implemented the way HTML5 is currently
> describing them. Also, same as for path parts, a fallback for query
> parts in legacy encodings is still availible (and was always
> available): %-encoding.
>

Some implementations also break the fallback %-encoding by first  
trying to reinterpret the %-encoding within the current document  
encoding and then translating where appropriate. For example if the  
percent encoding represents a Unicode code point that maps to the  
current document encoding the implementation uses that translated  
bytecode instead of the literal percent encoded bytecode. I'm not sure  
whether this is an unfixable implementation error or whether we could  
use HTML5 to get these implementations back on track though.

On Jul 8, 2008, at 11:20 AM, Stefan Eissing wrote:
>
> Am 08.07.2008 um 09:27 schrieb Julian Reschke:
>> The other issue that got a lot of discussion is whether the things  
>> used in HTML should be called "URL", when in reality they are  
>> something else.
>
> Calling them HREFs (even though they also appear in other  
> attributes) would give everyone the right context (HTML) and topic  
> (URLs) without the confusion of redefining existing terms.

 From the relevant RFCs the term "URL reference" already exists and is  
the appropriate term for the value taken by the @href, @cite, @src and  
other attributes ("URI reference" or "IRI reference" might also make  
sense).

Take care,
Rob

Received on Tuesday, 8 July 2008 11:48:52 UTC