- From: Martin Duerst <duerst@w3.org>
- Date: Sun, 07 Mar 2004 06:18:39 -0500
- To: "Roy T. Fielding" <fielding@gbiv.com>
- Cc: uri@w3.org
I have carefully read up to and including section 4 of draft-fielding-uri-rfc2396bis-04.txt. In general, the document is in extremely good shape. But there are some points that should be fixed. I'll mention them in separate emails, the most important ones first. Sections 2.1-2.4 repeatedly mention how data octets can be represented in URIs. For most data octets, it is clearly defined how they get represented. For example, the binary octet "00100000" (I'll use the C/... notation 0x20 from here on) gets represented as %20. For reserved characters, the document says "If no such delimiting role has been assigned, then a reserved character appearing in a component represents the data octet corresponding to its encoding in US-ASCII." This allows to get from reserved characters to octets, but does not say how to get from e.g. 0x40 to a reserved character ("@" in this case). The reader will probably infer that the inverse mapping is used, but this should be said in the document. The situation is even worse for unreserved characters. The closest one comes to find a correspondence between data octets and unreserved characters is at the end of 2.3: "For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved character by URI normalizers." The informed reader will probably say: "Hey, this looks too similar to US-ASCII to be anything else, so let's assume that it's US-ASCII". But this is not what the reader of a spec should have to do. So please, at the appropriate place, add a sentence saying something like: "Data octets which in the US-ASCII character encoding represent unreserved characters can be represented by the corresponding character. For example, the data octet 0x41 can be represented by "%41" or by "A"; for readability and comparability, the later is strongly preferred." Regards, Martin.
Received on Sunday, 7 March 2004 06:19:02 UTC