Re: octets <=> ASCII conversion (important) from Martin Duerst on 2004-04-21 (uri@w3.org from April 2004)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 21 Apr 2004 15:16:10 +0900
To: "Roy T. Fielding" <fielding@gbiv.com>
Cc: uri@w3.org
Message-Id: <4.2.0.58.J.20040421123351.06bf2fc0@localhost>
Hello Roy, others,

I have carefully compared draft-fielding-uri-rfc2396bis-05.txt
with the previous version, and with my notes and comments.

There are a lot of improvements in this draft, and in general,
I think this draft is indeed ready for going to the IESG.
Thanks to Roy for his great work!

However, the comment in my mail at
http://lists.w3.org/Archives/Public/uri/2004Mar/0012.html,
cited below, and including actual proposed text, does not
seem to have been addressed, nor did I find any reply saying
that or explaining why it would not need to be addressed, or
that (and how) it has been addressed.

So in case you think that this has been addressed, please
tell me where/how. In case you decided that it does not
need addressing, please tell me why you think so.
In case it needs addressing (and I'm sure it does),
I would be perfectly okay to have the actual fix done
by the RFC editor (after having seen the proposed text
here), because I think there is no disagreement whatsoever
that "1" should correspond to "%31", and so on, and not
something else, i.e. based on US-ASCII. But the spec had
better explicitly say so.


Regards,     Martin.


At 20:18 04/03/07 -0500, Martin Duerst wrote:

>I have carefully read up to and including section 4 of
>draft-fielding-uri-rfc2396bis-04.txt. In general, the
>document is in extremely good shape. But there are some
>points that should be fixed. I'll mention them in separate
>emails, the most important ones first.
>
>Sections 2.1-2.4 repeatedly mention how data octets can be
>represented in URIs. For most data octets, it is clearly
>defined how they get represented. For example, the binary octet
>"00100000" (I'll use the C/... notation 0x20 from here on) gets
>represented as %20.
>
>For reserved characters, the document says
>"If no such delimiting role has been assigned, then a
>reserved character appearing in a component represents the data octet
>corresponding to its encoding in US-ASCII."
>
>This allows to get from reserved characters to octets, but does
>not say how to get from e.g. 0x40 to a reserved character ("@"
>in this case). The reader will probably infer that the inverse
>mapping is used, but this should be said in the document.
>
>The situation is even worse for unreserved characters.
>The closest one comes to find a correspondence between
>data octets and unreserved characters is at the end of 2.3:
>"For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A
>and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore
>(%5F), or tilde (%7E) should not be created by URI producers and,
>when found in a URI, should be decoded to their corresponding
>unreserved character by URI normalizers."
>
>The informed reader will probably say: "Hey, this looks too
>similar to US-ASCII to be anything else, so let's assume that
>it's US-ASCII". But this is not what the reader of a spec
>should have to do.
>
>So please, at the appropriate place, add a sentence saying
>something like:
>"Data octets which in the US-ASCII character encoding represent
>unreserved characters can be represented by the corresponding
>character. For example, the data octet 0x41 can be represented
>by "%41" or by "A"; for readability and comparability, the later
>is strongly preferred."
>
>
>Regards,     Martin.
Received on Wednesday, 21 April 2004 02:27:09 UTC