- From: by way of Martin Duerst <mike@skew.org>
- Date: Fri, 26 Sep 2003 09:51:52 -0400
- To: uri@w3.org
Section 2.1 To say that you don't mandate the use of any particular encoding, and then to recommend the use of UTF-8 for a certain case, could probably stand for some revision. Try to avoid the terms "glyph" and "character set", if not used in a manner consistent with UTR#17. Try to cite ISO/IEC 10646 with at least "ISO/IEC" rather than just "ISO". (And should this be linked as a reference?) Questionable grammar in the last sentence should be addressed ("We recommend that the data first be encoded, then escaping [certain octets]" reads a little awkwardly.) The text currently reads: "As described above, the URI syntax is defined in terms of characters by reference to the US-ASCII encoding of characters to octets. This specification does not mandate the use of any particular mapping between its character set and the octets used to store or transmit those characters." [...] "Most URI schemes represent data octets by the US-ASCII character corresponding to that octet, either directly in the form of the character's glyph or by use of an escape triplet (section 2.4)." "When a URI scheme defines a component that represents textual data consisting of characters from the Unicode (ISO 10646) character set, we recommend that the data be encoded first as octets according to the UTF-8 [RFC2279] character encoding, and then escaping only those octets that are not in the unreserved character set." I suggest changing it to: "As described above, the URI syntax is defined in terms of characters by reference to the US-ASCII encoding of characters to octets. This specification only mandates that URIs be composed of abstract characters, regardless of what mapping to octets, if any, is used to store or transmit those characters. [...] "Most URI schemes represent data octets by the US-ASCII character corresponding to that octet, either directly in the form of the actual character, or indirectly by use of an escape triplet (section 2.4)." "When a URI scheme defines a component that represents textual data consisting of characters from the Unicode / ISO/IEC 10646 character repertoire, we recommend first encoding the data as octets according to the UTF-8 [RFC2279] character encoding, and then escaping only those octets that are not in the unreserved character set." -Mike
Received on Friday, 26 September 2003 09:52:49 UTC