- From: Stefan Eissing <stefan.eissing@greenbytes.de>
- Date: Fri, 31 Jan 2003 15:45:53 +0100
- To: uri@w3.org
- Cc: Martin Duerst <duerst@w3.org>
Am Donnerstag, 30.01.03, um 21:55 Uhr (Europe/Berlin) schrieb Martin Duerst: > As far as I understand, %hh is always usable, and I don't know > about any schemes that define explicitly that this can be used. > It may have been that this paragraph was written to take into > account schemes such as data:, where an additional mechanism > for encoding octets (base64) is used. My understanding is that > even in a data: URI, I should still be able to replace "A" by > "%41", and it should still resolve to the same data. > This reminds me of another issue which Tim Bray describes in http://www.textuality.com/tag/uri-comp-2.html namely that it is context dependant if '%61' can be considered equivalent to the charcter 'a' or not. The argument basically is that RFC 2396 allows other character encodings than US-ASCII and that '%61' could denote basically any character unless the character encoding becomes known. I argue that any 7 bit octet, escape-encoded in an URI, it MUST be equivalent (apart from reserved characters like %2f) to its US-ASCII character. In my opinion, RFC 2396 already defines this: In RFC 2396, Ch. 2.1 "In the simplest case, the original character sequence contains only characters that are defined in US-ASCII, and the two levels of mapping are simple and easily invertible: each 'original character' is represented as the octet for the US-ASCII code for it, which is, in turn, represented as either the US-ASCII character, or else the "%" escape sequence for that octet." In RFC 2396, Ch. 2.4.2: "For example, "%7e" is sometimes used instead of "~" in an http URL path, but the two are equivalent for an http URL." Accordings to this, my argument should be valid at least for HTTP URIs. I would like to have this issue clarified in RFC 2396bis for the following reason: The current wording confuses either me or Tim Bray. Given our individual level of understanding of URIs and the Web, I consider it a possibility that I am mistaken. ;-) However, one way or the other, the spec should address this issue in a more specific way. Of course this is coupled to the UTF-8 issue. Iff utf-8 becomes *the* encoding for URIs, my issue is resolved and Tim can shorten his excellent document. If utf-8 "just" becomes the default, then my issue stays valid, I think. So, could this be added to the issues list? Best Regards, Stefan
Received on Friday, 31 January 2003 09:46:31 UTC