Re: "How to Compare URIs" update 3 from Tim Bray on 2003-02-22 (www-tag@w3.org from February 2003)

From: Tim Bray <tbray@textuality.com>
Date: Sat, 22 Feb 2003 15:59:19 -0800
To: Martin Duerst <duerst@w3.org>
Cc: WWW-Tag <www-tag@w3.org>, uri@w3.org
Message-ID: <3E580ED7.3090106@textuality.com>

Martin Duerst wrote:

> Ah, now I see where your confusion is comming from.
> The characters in an URI (the ones that are compared character-by-
> character in namespaces) are just that, characters. URIs are
> defined independent of any particular representation. The URI
> spec says that /dir/a and /dir/%61 are equivalent, independent
> of the representation. They are equivalent if they appear in
> ASCII. They are equivalent if they appear on paper, on the
> side of a bus, and so on. They are equivalent when spoken
> over the radio. And they are equivalent when encoded as UTF-16
> (as your Java example shows) or in EBCDIC.

No, 'a' and %61 are *not* equivalent in an EBCDIC environment.  I just 
don't see where, in RFC2396, it says that the hex-encoding is 
necessarily that of the ASCII value of the character.  I repeat: if I'm 
on an EBCDIC computer, and the URI reads out as /dir/a, that is 
*different* from /dir/%61.  Yes, this is egregiously broken and stupid, 
but it's within the bounds set by RFC2396.

> RFC 2396 gives three levels, condensed in the following line:

Actually, the problem is that RFC2396 is just hopelessly unclear.  The 
fact that you and I are unable to agree on what it says is 
incontrovertible proof of this fact.  May I argue for a brief truce?  As 
part of the revision process, I'm working on an essay whose subject is 
what RFC2396bis *should* say on the subject of data and characters and 
octets and %-escaping, so that we don't have to have these endless 
arguments.  Frankly I would rather not waste any more time arguing about 
what the current revision of 2396 says.  -Tim

Received on Saturday, 22 February 2003 18:59:24 UTC