Re: rfc3987bis and RFC 6365

On 6/8/12 1:29 AM, John C Klensin wrote:
> 
> --On Thursday, June 07, 2012 13:47 -0600 Peter Saint-Andre
> <stpeter@stpeter.im> wrote:
> 
>>> 4. Strangely, RFC 6365 does not define "UCS", so I suppose
>>> it's OK to define that here.
> 
> I can't speak for Paul's reasoning because I don't think we
> discussed it explicitly, but omitting it from 6365 was
> deliberate on my part.  There were two reasons.  The first is
> that "universal character set" is itself ambiguous as to whether
> it refers to the Unicode/10646 code set or some other attempt.
> One can define that problem away if one assumes that the readers
> will carefully refer to the definitions even when they thing
> they know what a term means (my experience indicates that rarely
> happens but YMMD).  Second and far more important, I think we do
> ourselves and our audience no favors by using essentially
> synonymous terms interchangeably to refer to the same thing.  It
> does not help with understanding and may cause confusion.  The
> practice at the time RFC 2277 was written was to call that thing
> "ISO 10646" (not correct when 2277 was written, but see below).
> Once we discovered (more or less around the time RFCs 3454 and
> 3490 were coming together that we had clear requirements for
> property tables (and at the time, encodings) that were not part
> of ISO/IEC 10646 itself, the practice shifted toward calling
> that thing "Unicode".  We've gotten most of the community used
> to seeing those two terms as mostly interchangeable and being
> clear about the distinction when it is important.   Introducing
> "UCS" to the mix adds no value and risks reopening the mini-flap
> about our combining "character repertoire", "code set" (or
> "CCS"), and "encoding" into "charset" in RFC 2277 (and, earlier,
> RFC 1341 and its successors).
> 
> 
> (Massive nit-pick follows, but these things actually are
> important if one wants a clear and useful definition)
> 
> I don't believe 3987bis should define "UCS"; 

Yes, I was going to suggest that, but I wasn't sure what to propose in
its place (i.e., "ISO/IEC 10646" or "Unicode" -- the latter has the
benefit of being much more familiar to most people who would read this
specification).

> I believe it should
> get rid of the term entirely even if that means rewriting some
> sentences rather than just performing string substitution.  As
> an example of the desirability of doing this, please read the
> first paragraph of Section 2.1 [draft-ietf-iri-3987bis-11].
> First, despite the earlier definition and the use of "Universal
> Character Set in the Abstract [1] it notes "Universal Character
> Set" in parentheses, and then cites [ISO10646].  The intervening
> comma implies that those are two separate definitions, adding to
> the potential confusion.   Second, this definition (and the
> other definitions, see [1] below) appears to pretend that
> Unicode and ISO/IEC 10646 are the same, which they are not.  RFC
> 6365 was extremely careful about the relationship, which is
> another reason to use it rather then defining new terms.
> 
> There is an incidental problem about what "primarily" means in
> the key sentence.   There doesn't seem to be any nearby
> explanation.  If there isn't one, it should be dropped.
> 
> Recommendation:  In Section 2.1, 
> 
> Old:
>  The IRI syntax extends the URI syntax in [RFC3986] by
>  extending the class of unreserved characters, primarily
>  by adding the characters of the UCS (Universal Character
>  Set, [ISO10646]) beyond U+007F, subject...
> 
> New:
>  The IRI syntax extends the URI syntax in [RFC3986] by
>  extending the class of unreserved characters by adding
>  the characters (code points) of ISO/IEC 10646
>  [ISO10646] outside the ASCII repertoire, subject...

Works for me.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/

Received on Friday, 8 June 2012 15:03:14 UTC