encoding related phrasing from by way of Martin Duerst on 2003-09-26 (uri@w3.org from September 2003)

From: by way of Martin Duerst <mike@skew.org>
Date: Fri, 26 Sep 2003 09:51:52 -0400
To: uri@w3.org
Message-Id: <4.2.0.58.J.20030926095145.060c4378@localhost>

Section 2.1

To say that you don't mandate the use of any particular encoding,
and then to recommend the use of UTF-8 for a certain case, could
probably stand for some revision.

Try to avoid the terms "glyph" and "character set", if not used
in a manner consistent with UTR#17. Try to cite ISO/IEC 10646
with at least "ISO/IEC" rather than just "ISO". (And should this
be linked as a reference?)

Questionable grammar in the last sentence should be addressed ("We
recommend that the data first be encoded, then escaping [certain
octets]" reads a little awkwardly.)

The text currently reads:

   "As described above, the URI syntax is defined in terms of
   characters by reference to the US-ASCII encoding of characters
   to octets. This specification does not mandate the use of any
   particular mapping between its character set and the octets
   used to store or transmit those characters."

   [...] "Most URI schemes represent data octets by the US-ASCII
   character corresponding to that octet, either directly in the
   form of the character's glyph or by use of an escape triplet
   (section 2.4)."

   "When a URI scheme defines a component that represents textual
   data consisting of characters from the Unicode (ISO 10646)
   character set, we recommend that the data be encoded first as
   octets according to the UTF-8 [RFC2279] character encoding, and
   then escaping only those octets that are not in the unreserved
   character set."

I suggest changing it to:

   "As described above, the URI syntax is defined in terms of
   characters by reference to the US-ASCII encoding of characters
   to octets. This specification only mandates that URIs be
   composed of abstract characters, regardless of what mapping to
   octets, if any, is used to store or transmit those characters.

   [...] "Most URI schemes represent data octets by the US-ASCII
   character corresponding to that octet, either directly in the
   form of the actual character, or indirectly by use of an
   escape triplet (section 2.4)."

   "When a URI scheme defines a component that represents textual
   data consisting of characters from the Unicode / ISO/IEC 10646
   character repertoire, we recommend first encoding the data as
   octets according to the UTF-8 [RFC2279] character encoding, and
   then escaping only those octets that are not in the unreserved
   character set."

-Mike

Received on Friday, 26 September 2003 09:52:49 UTC