Re: UTF-8 revision

Ned Freed (Ned.Freed@INNOSOFT.COM)
Tue, 02 Sep 1997 10:57:02 -0700 (PDT)


Date: Tue, 02 Sep 1997 10:57:02 -0700 (PDT)
From: Ned Freed <Ned.Freed@INNOSOFT.COM>
Subject: Re: UTF-8 revision
In-reply-to: "Your message dated Tue, 02 Sep 1997 12:41:16 -0400"
To: Francois Yergeau <yergeau@alis.com>
Cc: Ned Freed <Ned.Freed@INNOSOFT.COM>, ietf-charsets@INNOSOFT.COM
Message-id: <01IN5Y2E02T294E6GE@INNOSOFT.COM>

> À 12:42 31/08/97 -0700, Ned Freed a écrit :
> >(1) The discussion of the Hangul mess and versioning is far too
> >    wishy-washy. What needs to be said is that the charset label "UTF-8" is
> >    aligned with the character assignments in Unicode 2.0 or later and that
> >    it is NOT aligned with the assignments in Unicode 1.0 or 1.1, in
> >    particular the old Hangul range.

> Agreed, it needs to be much more explicit.  What about the following
> changes in section 5 :

> 1st paragraph:

>  This memo is meant to serve as the basis for registration of a MIME
>  character set parameter (charset) [MIME].  The proposed charset
>  parameter value is "UTF-8".  This string would label media types
>  containing text consisting of characters from the repertoire of ISO/IEC
>  10646 including all amendments at least up to amendment 5 (Korean
>  block), encoded to a sequence of octets using the encoding scheme
>  outlined above.  UTF-8 is suitable for use in MIME content types
>  under the "text" top-level type.

I _really_ like this text -- it says exactly what it needs to say and 
nothing more.

> BTW, shouldn't the reference to [MIME] above be changed to refer to
> draft-freed-charset-reg-02.txt ?

Yes, I think it should, if for no other reason than I expect these three
documents (Harald's, yours, mine) to go out as a unit.

> Last paragraph, now split in two:

>  In practice, then, a version-independent label is warranted, provided
>  the label is understood to refer to all versions after Amendment 5,
>  and provided no incompatible changes actually occur.  Should
>  incompatible changes occur in a later version of ISO 10646, the MIME
>  charset label defined here will stay aligned with the previous version
>  until and unless the IETF specifically decides otherwise.

This is great!

>  Should the
>  need ever arise to distinguish data containing Hangul encoded according to
>  Unicode 1.1, then a version-dependent label, for that version only, should
>  be registered (a suggestion would be "UNICODE-1-1-UTF-8"), in order to
>  retain the advantages of a version-independent label for 2.0 and later
>  versions.  Such a version-dependent label could even be registered before
>  actual need arises, pre-emptively, but it is important to strongly
>  recommend against creating any new Hangul-containing data without
>  taking Amendment 5 of ISO 10646 into account.

> Note that this last sentence is actually a suggestion that should perhaps
> be decided at once.  Do we want to pre-emptively register
> "UNICODE-1-1-UTF-8" or some such?  If so, let's have affirmative language;
> if not, let's remove that last sentence.

Hmm. Well, now that you mention it, I tend to agree that simply registering a
label now would be a lot simpler. On the other hand, we certainly don't want to
encourage use of Unicode 1.1. But registering names for things is useful even
when we don't want them to be used. And we certainly have registered a lot of
charsets in the past -- I've never had a problem with registering so many, only
with the fact that the registrations in far too many cases are flawed in some
way.

As such, I think the right thing is to go ahead and register the name but to
state that use of this character set is strongly discouraged.

> >    I therefore think that
> >    this specification needs to say that it aligns automatically with
> >    all future versions of Unicode that don't make incompatible changes, but
> >    the minute one is made it stays aligned with the old version until and
> >    unless the IETF specifically decides otherwise.

> I think the new language above addresses that.  How is that?

I think it is excellent. I only hope we can get it through the process. Ceding
of change control, even to a "sister" standards body, even with clearly
delineated rules that prevent it from hindering interoperability, can be
a messy thing in the IETF.

				Ned

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)