- From: Francois Yergeau <yergeau@alis.com>
- Date: Tue, 10 Dec 1996 10:48:17 -0500
- To: www-international@w3.org
- Cc: erik@netscape.com, Klaus Weide <kweide@tezcat.com>
À 19:49 09-12-96 -0800, Erik van der Poel a écrit : >> The wise implementer, however, would be well advised to support >> the longer tag as an ad hoc alias. > >I'm not sure what you mean by "ad hoc alias", but the term "alias" is >used in this context (Internet "charsets") to mean a synonym. I used "ad hoc" in the sense of "for a particular purpose", in that case to deal with existing pages that use the unregistered "UNICODE-1-1-UTF-8" label. >Are "unicode-1-1-utf-8" and "utf-8" synonymous? For practical purposes, yes, unless one has to deal with data containing Korean Hangul coded according to Unicode 1.1. There doesn't seem to be any such data extent on the Internet, which is why I did not bother to register "UNICODE-1-1-UTF-8". >If so, what is the name of UTF-8-encoded Unicode 2.0? "UTF-8", which is also adequate for any future versions of Unicode, unless incompatible changes happen again (God forbid!). >Without rehashing the whole debate that you say already took place on >those other mailing lists (which I didn't follow), could you briefly >explain the future plans for the charset name "utf-8"? I glanced at RFC >2044 but didn't immediately see anything about this. The plan is to use "UTF-8" for all versions of UTF-8 encoded Unicode, which is workable, and even advantageous if no further incompatible changes occur ("incompatible changes" means deletion or displacement of one or more code points). There were two options: Plan 1: register UNICODE-1-1-UTF-8 UNICODE-2-0-UTF-8 UNICODE-3-0-UTF-8 when 3.0 comes out etc. Plan 2: register UNICODE-1-1-UTF-8 if it turns out to be needed UTF-8 stop there Consider what happens soon after Unicode 3.0 release, assuming that the UTC and ISO/IEC JTC1/SC2/WG3 stick to their pledges of no further incompatible changes. Consider more specifically the case of an upgraded server (3.0) sending a doc to an older client (2.0). If the new server labels the content "UNICODE-3-0-UTF-8", the old client fails to recognize that and refuses to process/display: total loss of functionality. If the new server sticks to a "UTF-8" label, the old client works as usual; there may be partial malfunction if the data contains some of the characters new to 3.0, but nothing the old client recognizes will be wrong. Hence you have a better transition and better interoperability with a non-version-specific label, assuming no incompatible changes. The registration of "UTF-8" is a bet that the relevant committees will stick to their word. Now the 1.1 => 2.0 transition did involve incompatibilities, but apparently irrelevant. If the problem shows up, it can be addressed without screwing up Plan 2, by registering "UNICODE-1-1-UTF-8", or perhaps "HANGUL-1-1-UTF-8" or some such. Regards, -- François Yergeau <yergeau@alis.com> Alis Technologies Inc., Montréal Tél : +1 (514) 747-2547 Fax : +1 (514) 747-2561
Received on Tuesday, 10 December 1996 10:53:55 UTC