Re: Issue for RFC 2396bis: 7-bit escapes from Stefan Eissing on 2003-01-31 (uri@w3.org from January 2003)

From: Stefan Eissing <stefan.eissing@greenbytes.de>
Date: Fri, 31 Jan 2003 15:45:53 +0100
To: uri@w3.org
Cc: Martin Duerst <duerst@w3.org>
Message-Id: <B289F80F-352A-11D7-A23E-00039384827E@greenbytes.de>

Am Donnerstag, 30.01.03, um 21:55 Uhr (Europe/Berlin) schrieb Martin 
Duerst:

> As far as I understand, %hh is always usable, and I don't know
> about any schemes that define explicitly that this can be used.
> It may have been that this paragraph was written to take into
> account schemes such as data:, where an additional mechanism
> for encoding octets (base64) is used. My understanding is that
> even in a data: URI, I should still be able to replace "A" by
> "%41", and it should still resolve to the same data.
>
This reminds me of another issue which Tim Bray describes in

http://www.textuality.com/tag/uri-comp-2.html

namely that it is context dependant if '%61' can be considered 
equivalent
to the charcter 'a' or not. The argument basically is that RFC 2396 
allows
other character encodings  than US-ASCII and that '%61' could denote
basically any character unless the character encoding becomes known.

I argue that any 7 bit octet, escape-encoded in an URI, it MUST
be equivalent (apart from reserved characters like %2f) to its
US-ASCII character. In my opinion, RFC 2396 already defines this:

In RFC 2396, Ch. 2.1
"In the simplest case, the original character sequence contains only
   characters that are defined in US-ASCII, and the two levels of
   mapping are simple and easily invertible: each 'original character'
   is represented as the octet for the US-ASCII code for it, which is,
   in turn, represented as either the US-ASCII character, or else the
   "%" escape sequence for that octet."

In RFC 2396, Ch. 2.4.2:
"For example, "%7e" is sometimes used instead of "~" in an http URL
path, but the two are equivalent for an http URL."
Accordings to this, my argument should be valid at least for HTTP URIs.

I would like to have this issue clarified in RFC 2396bis for the 
following reason:

The current wording confuses either me or Tim Bray. Given our individual
level of understanding of URIs and the Web, I consider it a possibility 
that
I am mistaken. ;-) However, one way or the other, the spec should 
address
this issue in a more specific way.

Of course this is coupled to the UTF-8 issue. Iff utf-8 becomes *the*
encoding for URIs, my issue is resolved and Tim can shorten his 
excellent
document. If utf-8 "just" becomes the default, then my issue stays 
valid, I think.

So, could this be added to the issues list?

Best Regards, Stefan

Received on Friday, 31 January 2003 09:46:31 UTC