- From: <dank@blacks.jpl.nasa.gov>
- Date: Sun, 24 Oct 1993 02:42:26 -0600 (MDT)
- To: ietf-wnils@ucdavis.edu, ietf-charsets@INNOSOFT.COM
I just finished reading the ietf-charsets archives. It looks like [meta?-]debate was still raging furiously as of a month ago on the ietf-charsets list over goals and encoding. Let me provide a novice's summary, for the possible benefit of wnils people. I apologize in advance for any inaccuracies. --- Everybody wants to work together to define a super character set to handle all the world's languages. It seems there are two major families of super character sets, ISO 2022 (for example, used by the X consortium and the MULE text editor), and ISO 10646 (for example, used more or less by Plan 9 and the SAM text editor). MIME chose to specify the character set used in the header of each item, but this approach is not viewed as promising for the future. RFC 1345 is ISO 16046 based, and defines a representation of ISO 16046 using 'mnemonic' sequences. ISO 2022 is seen as complex and too stateful to be promising for the future, although it has seen real world use. ISO 16046 is seen as the main development path. It has several variants and several possible encoding schemes. Unification of Han characters from different languages was tried in the Unicode variant, and met with strenuous objection, although Westerners don't understand quite what the issue is. So Unicode is not the answer, although something close to Unicode will be. UCS-2, UCS-3, and UCS-4 seem to be ISO 16046 related character sets of roughly 8^2, 8^3, and 8^4 codes each. UTF-2 is a proposed encoding for all of these. The final solution will not be based on 16-bit wide characters externally, but will rather be >16 bits internally, and use variable length representation externally, partly for compatibility with ASCII, partly to represent common symbols with short codes, even for far east languages. It will allow intermixing of dozens of languages within the same paragraph. The issues of bidirectionality, comparison, and equality testing were mentioned very briefly- they may have been discussed offline at a BOF. Perhaps someone familliar with these issues could post a word or two about them. A list of eight or ten important properties of the final encoding solution were posted and more or less agreed to. Several proposals for the encoding were posted, with no clear winner. Most try to be compatible with the UTF-2 encoding. For several months, people argued and had trouble communicating, although they seemed in basic agreement about goals. The list has been silent for about a month until the Unicode cross-post from ietf-wnils. --- - Dan Kegel --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Sunday, 24 October 1993 02:46:07 UTC