- From: Daniel R. Kegel <dank@alumni.cco.caltech.edu>
- Date: Thu, 27 Jan 1994 23:41:08 -0800
- To: ietf-charsets@INNOSOFT.COM
- Cc: insoft-l@cis.vutbr.cz
[ietf-charsets' charter is to decide how best to represent text on the Internet, now that ASCII is no longer enough for most Internet users. Most members of the list seem happy with Unicode. Mr. Ohta is violently opposed, and has proposed extending Unicode with several bits *per character* to indicate language. When that was shot down, he proposed an extension of ISO2022 instead which completely ignores Unicode. I think midway between Mr. Ohta's two proposals might make more sense. -dan] I am concerned that Japan may ignore Unicode [see the archives of INSOFT-L referred to in my last message] because it fails to address an important need from their point of view: encoding language. A mixed Korean/Japanese/Chinese document in *plain* Unicode CANNOT be displayed in a palatable way. This renders plain Unicode unacceptable for transmitting this type of document over the Internet. Worse, there is no standard way of marking up a Unicode document to indicate language, so even adorned Unicode cannot be used interoperably for this kind of document on the net. Of course, we could wait for UCS-4 to solve this problem- but it isn't anywhere near ready, won't be for many years, and IMHO is overkill for the problem at hand. A quick and dirty way to address the problem would be to define a set of control codes as an extension of Unicode to indicate language, in much the same way as ISO2022 defines control codes to switch character sets. Display applications which do not support different fonts for different languages can simply ignore the codes. Applications which deal with non-Han languages need not bother with the codes, as plain Unicode is sufficient for those languages. The codes should cause little overhead, as most documents do not change language very frequently, and they can in any case be omitted when not needed. Unless something like this is done in a way that gains at least grudging acceptance in Japan, we may not end up with a truly interoperable method of representing text on the Internet! Folks, do you want Unicode to be the universal way to represent text, as I do? Do you agree that there is a serious disconnect with Japan on the usability of Unicode for mixed C/J/K text? Isn't a standard way of layering language encoding on Unicode desirable? Or am I way out in left field here? - Dan Kegel (dank@alumni.caltech.edu) --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Thursday, 27 January 1994 23:41:25 UTC