[Prev][Next][Index][Thread]

Re: Accept-Charset support



À 19:49 09-12-96 -0800, Erik van der Poel a écrit :
>> The wise implementer, however, would be well advised to support
>> the longer tag as an ad hoc alias.
>
>I'm not sure what you mean by "ad hoc alias", but the term "alias" is
>used in this context (Internet "charsets") to mean a synonym.

I used "ad hoc" in the sense of "for a particular purpose", in that case to
deal with existing pages that use the unregistered "UNICODE-1-1-UTF-8" label.

>Are "unicode-1-1-utf-8" and "utf-8" synonymous?

For practical purposes, yes, unless one has to deal with data containing
Korean Hangul coded according to Unicode 1.1.  There doesn't seem to be any
such data extent on the Internet, which is why I did not bother to register
"UNICODE-1-1-UTF-8".

>If so, what is the name of UTF-8-encoded Unicode 2.0?

"UTF-8", which is also adequate for any future versions of Unicode, unless
incompatible changes happen again (God forbid!).

>Without rehashing the whole debate that you say already took place on
>those other mailing lists (which I didn't follow), could you briefly
>explain the future plans for the charset name "utf-8"? I glanced at RFC
>2044 but didn't immediately see anything about this.

The plan is to use "UTF-8" for all versions of UTF-8 encoded Unicode, which
is workable, and even advantageous if no further incompatible changes occur
("incompatible changes" means deletion or displacement of one or more code
points).

There were two options:

Plan 1: register
 UNICODE-1-1-UTF-8
 UNICODE-2-0-UTF-8
 UNICODE-3-0-UTF-8 when 3.0 comes out
 etc.

Plan 2: register
 UNICODE-1-1-UTF-8 if it turns out to be needed
 UTF-8
 stop there

Consider what happens soon after Unicode 3.0 release, assuming that the UTC
and ISO/IEC JTC1/SC2/WG3 stick to their pledges of no further incompatible
changes.  Consider more specifically the case of an upgraded server (3.0)
sending a doc to an older client (2.0).

If the new server labels the content "UNICODE-3-0-UTF-8", the old client
fails to recognize that and refuses to process/display: total loss of
functionality.

If the new server sticks to a "UTF-8" label, the old client works as usual;
there may be partial malfunction if the data contains some of the characters
new to 3.0, but nothing the old client recognizes will be wrong.

Hence you have a better transition and better interoperability with a
non-version-specific label, assuming no incompatible changes.  The
registration of "UTF-8" is a bet that the relevant committees will stick to
their word.

Now the 1.1 => 2.0 transition did involve incompatibilities, but apparently
irrelevant.  If the problem shows up, it can be addressed without screwing
up Plan 2, by registering "UNICODE-1-1-UTF-8", or perhaps "HANGUL-1-1-UTF-8"
or some such.

Regards,

-- 
François Yergeau <yergeau@alis.com>
Alis Technologies Inc., Montréal
Tél : +1 (514) 747-2547
Fax : +1 (514) 747-2561


Follow-Ups: