- From: Peter Saint-Andre <stpeter@stpeter.im>
- Date: Fri, 08 Jun 2012 08:44:56 -0600
- To: John C Klensin <john-ietf@jck.com>
- CC: public-iri@w3.org
On 6/8/12 1:29 AM, John C Klensin wrote: > > --On Thursday, June 07, 2012 13:47 -0600 Peter Saint-Andre > <stpeter@stpeter.im> wrote: > >>> 4. Strangely, RFC 6365 does not define "UCS", so I suppose >>> it's OK to define that here. > > I can't speak for Paul's reasoning because I don't think we > discussed it explicitly, but omitting it from 6365 was > deliberate on my part. There were two reasons. The first is > that "universal character set" is itself ambiguous as to whether > it refers to the Unicode/10646 code set or some other attempt. > One can define that problem away if one assumes that the readers > will carefully refer to the definitions even when they thing > they know what a term means (my experience indicates that rarely > happens but YMMD). Second and far more important, I think we do > ourselves and our audience no favors by using essentially > synonymous terms interchangeably to refer to the same thing. It > does not help with understanding and may cause confusion. The > practice at the time RFC 2277 was written was to call that thing > "ISO 10646" (not correct when 2277 was written, but see below). > Once we discovered (more or less around the time RFCs 3454 and > 3490 were coming together that we had clear requirements for > property tables (and at the time, encodings) that were not part > of ISO/IEC 10646 itself, the practice shifted toward calling > that thing "Unicode". We've gotten most of the community used > to seeing those two terms as mostly interchangeable and being > clear about the distinction when it is important. Introducing > "UCS" to the mix adds no value and risks reopening the mini-flap > about our combining "character repertoire", "code set" (or > "CCS"), and "encoding" into "charset" in RFC 2277 (and, earlier, > RFC 1341 and its successors). > > > (Massive nit-pick follows, but these things actually are > important if one wants a clear and useful definition) > > I don't believe 3987bis should define "UCS"; Yes, I was going to suggest that, but I wasn't sure what to propose in its place (i.e., "ISO/IEC 10646" or "Unicode" -- the latter has the benefit of being much more familiar to most people who would read this specification). > I believe it should > get rid of the term entirely even if that means rewriting some > sentences rather than just performing string substitution. As > an example of the desirability of doing this, please read the > first paragraph of Section 2.1 [draft-ietf-iri-3987bis-11]. > First, despite the earlier definition and the use of "Universal > Character Set in the Abstract [1] it notes "Universal Character > Set" in parentheses, and then cites [ISO10646]. The intervening > comma implies that those are two separate definitions, adding to > the potential confusion. Second, this definition (and the > other definitions, see [1] below) appears to pretend that > Unicode and ISO/IEC 10646 are the same, which they are not. RFC > 6365 was extremely careful about the relationship, which is > another reason to use it rather then defining new terms. > > There is an incidental problem about what "primarily" means in > the key sentence. There doesn't seem to be any nearby > explanation. If there isn't one, it should be dropped. > > Recommendation: In Section 2.1, > > Old: > The IRI syntax extends the URI syntax in [RFC3986] by > extending the class of unreserved characters, primarily > by adding the characters of the UCS (Universal Character > Set, [ISO10646]) beyond U+007F, subject... > > New: > The IRI syntax extends the URI syntax in [RFC3986] by > extending the class of unreserved characters by adding > the characters (code points) of ISO/IEC 10646 > [ISO10646] outside the ASCII repertoire, subject... Works for me. Peter -- Peter Saint-Andre https://stpeter.im/
Received on Friday, 8 June 2012 15:03:14 UTC