- From: Tim Bray <tbray@textuality.com>
- Date: Sat, 22 Feb 2003 15:59:19 -0800
- To: Martin Duerst <duerst@w3.org>
- Cc: WWW-Tag <www-tag@w3.org>, uri@w3.org
Martin Duerst wrote: > Ah, now I see where your confusion is comming from. > The characters in an URI (the ones that are compared character-by- > character in namespaces) are just that, characters. URIs are > defined independent of any particular representation. The URI > spec says that /dir/a and /dir/%61 are equivalent, independent > of the representation. They are equivalent if they appear in > ASCII. They are equivalent if they appear on paper, on the > side of a bus, and so on. They are equivalent when spoken > over the radio. And they are equivalent when encoded as UTF-16 > (as your Java example shows) or in EBCDIC. No, 'a' and %61 are *not* equivalent in an EBCDIC environment. I just don't see where, in RFC2396, it says that the hex-encoding is necessarily that of the ASCII value of the character. I repeat: if I'm on an EBCDIC computer, and the URI reads out as /dir/a, that is *different* from /dir/%61. Yes, this is egregiously broken and stupid, but it's within the bounds set by RFC2396. > RFC 2396 gives three levels, condensed in the following line: Actually, the problem is that RFC2396 is just hopelessly unclear. The fact that you and I are unable to agree on what it says is incontrovertible proof of this fact. May I argue for a brief truce? As part of the revision process, I'm working on an essay whose subject is what RFC2396bis *should* say on the subject of data and characters and octets and %-escaping, so that we don't have to have these endless arguments. Frankly I would rather not waste any more time arguing about what the current revision of 2396 says. -Tim
Received on Saturday, 22 February 2003 18:59:24 UTC