- From: Daniel R. Kegel <dank@alumni.cco.caltech.edu>
- Date: Thu, 27 Jan 1994 00:48:08 -0800
- To: ietf-charsets@INNOSOFT.COM
- Cc: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>, dank@alumni.cco.caltech.edu
Sorry if this is old news to you folks, but I just stumbled on a mailing list of interest to the ietf-charsets group. Here is an example posting pulled from their archive. This is the same group that is starting up the journal _I18N_. They are looking for a new moderator, by the way. p.s. This article contains a good summary of the Unicode vs. Japan situation, as seen by the West. Mr. Ohta's position is apparantly shared by many in the East, although they express it less forcefully. The key worry is achieving pleasing display of mixed Chinese/Korean/Japanese text, which requires encoding language (or equivalently, font). The Japanese position is that this should be covered by any character set standard; the West's position is that this is external to the standard. Another problem is that the Han unification was not done with the politeness needed for success in Japan. Mr. Ohta, am I close? - Dan Kegel (dank@alumni.caltech.edu) ----------------- example from insoft-l@cis.vutbr.cs ----------------- >From: Bowyer Jeff <jbowyer@cis.vutbr.cs> Subject: Consolidated Answers for "Unicode and the Japanese Market" To: insoft-l@cis.vutbr.cs Date: Fri, 13 Aug 1993 14:03:05 +0200 (MET DST) Reply-To: jbowyer@cis.vutbr.cs X-Mailer: ELM [version 2.4 PL20] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 25207 Here are the consolidated responses to the original questions from Bob Peterson ("gemgrp::peterson"@tle.enet.dec.com): I have heard that while Unicode contains Kanji, it does so in a way that is not acceptable to the Japanese market, and hence was not approved by them in recent votes. Does this mean a product that supports Unicode alone wil not be as acceptable as a product that handles Japanese character sets using other encoding methods. What one other encoding should a product use in addition to Unicode in order to succeed in Japan? Or will Unicode be adapted to succeed? How soon? (It is bad enough we will probably make two versions of some products, 8 bit ISO-Latin and Unicode, but if we have to maintain a 3rd version this becomes lunacy). Jeff INSOFT-L List Manager *========================================================================* Jeff Bowyer EMail: Computing Center jbowyer@cis.vutbr.cz Technical University of Brno Udolni 19, 602 00 BRNO Czech Republic *========================================================================* Sender: Glenn Adams <glenn@metis.com> Technical Director, Unicode Consortium Such an opinion that "[Unicode] is not acceptable to the Japanese market" is quite premature, don't you think? How many Unicode products are on the market in Japan? [I know of one, the Kanji version of GO's Penpoint operating system; I'm sure others are being developed now.] For a given Unicode product on the market in Japan, in what way is it "not acceptable"? It is a bit pointless to make broad statements like this without providing a few facts; such as, what specifically is it about Unicode that makes it unsuitable for use in Japan? Whether the Japanese voted yes or no (on ISO/IEC 10646, and not Unicode) is not relevant to the feasibility of Unicode (and ISO/IEC 10646) in Japan. I will offer some comments on the feasibility of Unicode in Japan: 1. Unicode is only a character set, period. It is not a font. It is not an I18N subsystem. It is not a library which applications may use to implement I18N solutions. It is just a character set. Its purpose is to represent character data of all languages (and not to meet the special needs of a particular language over and above the task of representing its character repertoire). 2. Unicode contains all of the characters in the three JIS standards JIS X 0201, JIS X 0208, and JIS X 0212. In addition, it contains many CJK ideographs which are not in these character sets but which are used as 'gaiji' characters by users in Japan. In order to convert between JIS data (of various encoding styles such as EUC-JP, ISO 2022J, Shift JIS, etc.) and Unicode, all that is required is a 2-way mapping table for each of the three JIS standards. The mapping table for JIS X 0208 requires approximately 26Kbytes to support mappings in both directions. Such a mapping loses no data; i.e., round-trip conversion is possible without losing any information. 3. Unicode does not prescribe the visual appearance of a character; therefore, in displaying Japanese text encoded in Unicode, it would be quite proper to use a JIS 0208 or 0212 encoded font, or a Japanese font of any other encoding for that matter. 4. Unicode does not prescribe the sorting order of a character; therefore, in sorting Japanese text encoded in Unicode, it would be quite proper to use a JIS oriented sorting weight table if a user expects text to be ordered according to JIS, or by any other acceptable sorting order desired by the user. [One should recognize that a sorting order based on JIS order is not entirely acceptable in many cases either; e.g., two completely separate sorting methods are used in JIS X 0208, one based on phonetic order (for level 1 kanji), the other based on radical/stroke order (for level 2 kanji); JIS X 0212 (level 3 kanji) also uses the radical/stroke technique. There have been a number of articles published in Japan about problems with depending on the order proscribed by JIS for performing sorts which meet user expectations.] Given the above facts, one can only conclude that there is no technical reason whatsoever that a given piece of software using Unicode couldn't meet the needs of Japanese users according to the capabilities provided by a character set. Of course, Unicode, as a character set, needs software to make it useful. It is that software which must meet the needs of Japanese users; Unicode by itself does not prevent such needs being met as well as existing JIS based systems. If anything it facilitates creating better, more encompassing Japanese software systems since it incorporates all JIS sets into a single fixed width character set. This by itself will help the development of Japanese software by providing a way to migrate away from the stateful, multibyte encoding systems such as ISO 2022 or EUC-JP. You might wonder why I haven't mentioned anything about the unification of Chinese, Japanese, and Korean ideographs in Unicode. The reason I didn't mention it is because it doesn't have a bearing on using Unicode to represent Japanese text. Nor does it have a bearing on using Unicode to represent Chinese text. The only time it does have a bearing is when one wants to represent mixed CJK text. In such a case, if one wants to employ a different font to display the same character in a way that is acceptable to a Chinese reader versus a Japanese reader, then it will be necessary to augment such a representation with font tags (or with language tags) so as to enable selecting the correct font. For a non- Unicode system to do this now would also require doing the same, or, alternatively, mixing multiple character sets such as JIS X 0208, GB 2312, and KS C 5601, and then selecting different fonts based on the character set. Such a technique requires tagging the character set of a given character from which the language or font can be inferred. In the case of Unicode, one would use a single character set, Unicode, and then employ a font or language tag explicitly. The difference between the two tecniques is neglible. However, using Unicode gives one the ability to not require processing multiple character sets at the same time. This, after all, was perhaps one of the most important goals of Unicode. Does this mean a product that supports Unicode alone will not be as acceptable as a product that handles Japanese character sets using other encoding methods. The acceptability of a product will be based on whether it meets a user's needs, not on what character set it uses (assuming that the given character set meets the character repertoire needs in the first place, which Unicode does). What one other encoding should a product use in addition to Unicode in order to succeed in Japan? Or will Unicode be adapted to succeed? How soon? No other encoding is needed (internally). Of course and application will need to be able to import/export text in other existing character sets, but it needn't use them internally. Unicode will not be adapted to meet non-existent needs. If the Japanese (or anyone else) does identify real needs, then they will certainly be evaluated. The Unicode consortium welcomes public participation in its technical discussions and at its meetings. Keep in mind that Unicode, Version 1.1, is now synchronized with ISO/IEC 10646-1:1993, developed by ISO/IEC JTC1/SC2/WG2. Unicode is essentially a usage profile of ISO/IEC 10646-1:1993 (UCS-2, Level 3). Because ISO/IEC 10646 is a very important international standard, the Japanese development computer and communications industries will certainly take it into account as a legitimate standard. Sender: Asmus Freytag <asmusf@microsoft.com> | I have heard that while Unicode contains Kanji, it does so in a way | that is not acceptable to the Japanese market, and hence was not | approved by them in recent votes. Does this mean a product that | supports Unicode alone wil not be as acceptable as a product that | handles Japanese character sets using other encoding methods. To see why this is a strange question, let's ask it about English. What about the acceptibility of products for English that support only Unicode (and not ASCII). Well, first off, how would I (the user) know it didn't handle ASCII? Only if it fails to import data files in ASCII, in other words, "if it is not compatible". Nobody wants incompatible products, so new prodcucts ALWAYS need to be prepared to deal with existing data. | What one other encoding should a product use in addition to Unicode | in order to succeed in Japan? Or will Unicode be adapted to succeed? | How soon? The answer to the first half, depends on the user. In a PC environment in Japan, Shift-JIS (with vendor extensions) is the most widely used native character encoding. A new piece of software supporting Unicode must be able to accept existing data files in Shift-JIS. | (It is bad enough we will probably make two versions of some products, 8 bit | ISO-Latin and Unicode, but if we have to maintain a 3rd version this becomes | lunacy). The choice is more one like this: Do I make a separate 8-bit (Latin-1) and Shift-JIS version of my application, or do I create a single Unicode version (with appropriate compatibility with existing data, e.g. by one-time or on the fly conversion). In case of Windows NT, Microsoft decided to only create one version of the core operating system. The kernel, file system, etc. all support exclusively Unicode. At the same time Windows NT for Japan will be a compatible player in the Shift-JIS environment: It will run Shift-JIS Windows and DOS applications out of the box, the file system is able to read and write directory information on disks that were formatted using Shift-JIS, etc. So for the user in Japan interested in doing just what he or she has always been doing, there will be no observable change--despite the fact that the system supports _only_ Unicode in its bowels. The change is visible for Microsoft, where adapting the system to Japan is taking less than half the amount of time it took for Windows 3.1, and for application vendors who now, for the first time, have the choice of writing only _one_ version of their application for NT, namely the Unicode version. This Unicode version will run without modifications (other than translating the user interface) on the Japanese version of NT as well as any other version of NT. It is this simplification, that will drive the acceptance of Unicode as a delivery vehicle in the long run, especially as the vast majority of packaged software is createdd by vendors who sell software globally. I have used the term "delivery vehicle" deliberately. Nobody expects that overnight all existing data (and host computers, and...) will have transformed themselves. So what we are looking for is a vehicle that lets us provide software cost effectively that delivers functionality into markets with very different legacy character sets. Unicode is the delivery vehicle for that kind of future software. Sender: Chiaki Ishikawa (pmcgw!personal-media.co.jp!ishikawa@uunet.UU.NET) Personal Media Corp. Tokyo, Japan >> I have heard that while Unicode contains Kanji, it does so in a way that is not >> acceptable to the Japanese market, and hence was not approved by them in recent >> votes. Does this mean a product that supports Unicode alone wil not be as >> acceptable as a product that handles Japanese character sets using other >> encoding methods. There are few things that you ask or mention as you heard it on the grapevine. (1) "I have heard that while Unicode contains Kanji, it does so in a way that is not acceptable to the Japanese market, and hence was not approved by them in recent votes." Now, I can't speak for all the Japanese programming community. Yes, there is a trend to move against Unicode. Basically the objection seems to come from (a) the rather uncalled for (depending on your point view, it WAS called for) `unification' and (b) collision between Unicode and then (or previouslY) on-going JIS efforts for multibyte charcter code standardization efforts. How strong is the opposing movement? The part of Japanese computer community who are unhappy with the Unicode standard already formed a small comittee within an industrial association to study the standard BEYOND Unicode (I think they talk about 4 bytes code, but I am not sure). Their attitude seems to be to silently ignore Unicode and move on the the NEXT generation of standard when the software/hardware and market maturity reaches the point where multi-language character handling is a MUST for the majority of computer platform in the next several years. Please not that since Unicode is already (part of) ISO standard, that there will be a corresponding Japanese standard in the not so distant future. However, standard is only valid as long as the user/vendor stick to it. If the standard is not followed by the user/vendor community very well, it will die of obsolescence. Whether the ignorance tactics of Unicode opponents succeeds depends upon the success of Unicode in Japan, and I think it is directly related to the success of Windows NT. [I myself think (a) unification was NOT carried out with all the grace and cultural acceptability to the taste of Japanese computer community. I am sorry that I don't have time for discussion right now, am too busy doing work for a project that has to be finished within this year. I thank Glenn of Unicode Consorcium who seems to read this mailing list to have enlightened me about Unicode early this year when I had time to post to comp.std.internat. He might be amused/shocked to learn of the small committee of which I mention above. Somehow the reading room magazine rack of my office had the literature from the comittee. It is full of mumbo jumbo of ABC soup of standard acronyms and hard to understand their own opinion. But I think my description of their intention is accurate enough. You might want to monitor comp.std.internat newsgroup for occasional heated arguments therein regarding Unicode.] (2) "Does this mean a product that supports Unicode alone wil not be as acceptable as a product that handles Japanese character sets using other encoding methods? Yes, definitely. If you are not prepared and your product doesn't fly, don't blame it on the Japanese distribution system :-) "What one other encoding should a product use in addition to Unicode in order to succeed in Japan? Or will Unicode be adapted to succeed? How soon?" My personal opinion is that the success of Unicode hinges on the acceptance of Windows-NT in Japan. This is because Microsoft Windows-NT is using Unicode. It is anyone's guess whether WNT will be THE desktop on which multi-lingual word processing, for example, will take place in the next few years. (Aside from Intel platform, DEC also promotes Alpha PC which runs Windoes/NT as well OpenVMS, and Ultrix. But, DEC being in the low profile it is in in terms of commercial success, it is not followed closely by majority of the Japanese market.) It will be a couple of years whether Windows NT will be a success in the sense of Windows's success. So we don't know whetehr Unicode is widely adopted or not in Japan. In the meantime, you can't come to Japan and expect reasoanble marketing success unless your system supports - (MS-DOS): Shift-Jis Japanese code system Oh, incidently this is the character code system used by Windows application, too. - (UNIX): EUC ... Japanese Extended Unix Code. Some WS vendors use Shift-Jis for better interoperability with DOS application, but their market share is minor. HP is among them. But, they seem to have provide EUC support nowadays. Sun is the market leader and they use EUC, period. [and if you need to talk to old mainframe or yours is more concerend with communication software - (JIS): There are several JIS standards. You will know what you will need if your software is geard to communication.] So, the choice of the character code system depends upon which target your product is meant for. Ask your local representative of your target hardware company which code they use in Japan. Currently, other than Microsoft Win/NT (and possibly some Japanese makers who might bundle Win/NT in Japan), I haven't heard of ANY Japanese computer manufacturer who supports Unicode today or have announced to support Unicode anytime soon. (Correct me if I am wrong.) Japanese mainframe vendors such as Nihon IBM, Fujitsu, Hitachi, NEC have character code system which is an extension of JIS to support unusual (slight variation) of characters for proper nowns: name of people and places. They add characters to base JIS standard they have adopted. This and other compatibility issues were being addressed with current and then on-going JIS standard activity to codify character code system at ISO level when UNICODE preempted, so to speak. I think an early JIS delegate to ISO committee was somewhat taken aback to learn recently that Unicode people are now consiering the extension to the current Unicode to accommocate more characters by clever encoding (allowing more than 2 bytes for a character) because he seems to have felt that if we would go above 2 byets there were no needs for "Han Unification" at all to begin with! >> (It is bad enough we will probably make two versions of some products, 8 bit >> ISO-Latin and Unicode, but if we have to maintain a 3rd version this becomes >> lunacy). You can say that again. I think there are people who are driven up the walls with the introduction of Unicode. Me? Although I think the UNICODE unification was a lousy idea, I will go as the wind blows. You can't argue with the majority of your target audience as far as character set issues are concerned. For example, I despise S-JIS and don't want to use it , but what can I do otherwise if I want to use DOS PC in Japan? One consolation is there is one to one map between EUC <-> S-JIS <-> JIS (and Unicode). So if you have a conversion filter built once, at least the conversion shouldn't be THAT hard. Sender: David Goldsmith <David_Goldsmith@taligent.com> Taligent, Inc. This question was forwarded to the 10646 mailing list. John Jenkins of Taligent collected the replies there, and I am forwarding them on to this list. --------------- From ISO10646@JHUVM.HCF.JHU.EDU Wed Oct 14 14:52:28 1998 Date: Mon, 9 Aug 1993 08:50:38 EDT Reply-To: Multi-byte Code Issues <ISO10646@JHUVM.HCF.JHU.EDU> Multi-byte Code Issues <ISO10646@JHUVM.HCF.JHU.EDU> Sender: Edwin Hart <HART%APLVM.BITNET@cunyvm.cuny.edu> Subject: Re: Does Unicode satisfy Japanese market? X-To: insoft-l@cis.vutbr.cs, Multi-Byte Code Issues <ISO10646@JHUVM.BITNET> To: Multiple recipients of list ISO10646 <ISO10646@JHUVM.HCF.JHU.EDU> Glenn Adams gave an excellent response to this question. I represent a large set of IBM customers to the U.S. technical standards committee for codes and character sets. While I cannot speak for Japanese customers, I have a perspective on U.S. and some Canadian customers that may or may not be valid for other customers. 1. Customers do not care about how the information is coded. (That is, unless the way the information is coded causes something like printing/displaying or communication to fail. Even then, they do not care about the encoding; they want to know how to fix the problem with the minimum of technical details.) 2. Customer care about the applications that they use. 3. With respect to coding, customers have these types of questions about the applications: a. Are the characters I need available? As Glenn said the answer is yes because Unicode/10646-1 contains the Kanji characters from the 3 Japanese standards. b. How do I enter them? This is not a coding issue. c. Do the characters display and print correctly? In Japan, this is a font issue and Japan has fonts for the JIS standard characters in Unicode/10646-1. I would presume that anyone who wants a product to be successful in Japan will use the Japanese fonts. Everything I have heard indicates that this is one of THE major issues for the Japanese. d. Can I correctly communicate the information in the coded characters to others? This is a coding issue. However, as Glenn discussed, other information (such as the font/country) may also be required to display the correct shape for the character. e. Do the characters sort correctly? This is not a coding issue. It is an internationalization (I18N) issue. The sorting software needs to account for culturally-correct sorting. In summary, the issue is not how the characters are coded in an application but how well an application meets the customer's expectations and needs. Unicode and ISO/IEC 10646-1 are part of the solution to meeting customer needs but meeting the customer needs requires much much more that Unicode and 10646. Ed Hart From ISO10646@JHUVM.HCF.JHU.EDU Wed Oct 14 14:52:29 1998 Date: Tue, 10 Aug 1993 11:39:18 +0200 Reply-To: Multi-byte Code Issues <ISO10646@JHUVM.HCF.JHU.EDU> Sender: Multi-byte Code Issues <ISO10646@JHUVM.HCF.JHU.EDU> From: Andr'e PIRARD <PIRARD%BLIULG11.BITNET@cunyvm.cuny.edu> Organization: University of Liege (Belgium), SEGI (Computing Center) Subject:Re: Does Unicode satisfy Japanese market? To: Multiple recipients of list ISO10646 <ISO10646@JHUVM.HCF.JHU.EDU> On Mon, 9 Aug 1993 08:50:38 EDT Edwin Hart said: 33. With respect to coding, customers have these types of questions about the aapplications: >d. Can I correctly communicate the information in the coded characters tto others? >This is a coding issue. However, as Glenn discussed, other information (such as the font/country) may also be required to display the correct shape for the character. I still wonder what's the defined encoding to send Unicode on communication paths, and whether Unicode hosts will be compatible with ISO 10646 hosts in this respect. In other words: "Does Unicode satisfy the _whole_ market?" Or: "Is it just a font or more?" It must be realized that, from a communication perspective, what's happening inside a machine (what code it uses) is of little concern as long as each appears others as if it were using a common code by using the same one (and same encoding) in communication. Bewaring that communication also means mag tapes and CD ROMs. The standards for data communication are the first things to consider. It's a time saving to design the machine internals second, accordingly. And it's a time saving to define global communication encoding standards rather than tackle the problem anew for each application protocol (e-mail, file transfer, terminal mode, database aso...). Just Latin-1 communication shows that immediate interest has been a mistake of the past. And communication is now a fact. For everybody, I hope. Sender: John Finlayson (johnf@findog.HQ.Ileaf.COM) The most common code used in Japan is Shift-JIS, which is used by PCs, Macs, and many workstations, although there is a trend toward EUC (Extended UNIX Code) on Unix. For more info on Japanese text encoding, I recommend Ken Lunde's japan.inf, available (last I checked) via anonymous ftp from ucdavis.edu (128.120.2.1) in the pub/JIS directory. Sender: Martin "J." Duerst <mduerst@ifi.unizh.ch> Institute for Informatics University of Zurich As far as I have heard, the reason that the Japanese voted No has to do with tecnicalities of the standardization process. It did not mean that Unicode was disapproved altogether by the Japanese. A second point is that not all Japanese are equally happy with Unicode, but then, probably not all Americans share exactly the same view about it, either. What is more important for you is the fact that although Unicode may become the future standard all over the World (and so in Japan, too), it is not yet that common anywhere. Thus conversions from and to Unicode are a must in every Unicode product, esp. if it works in an open environment. That then means that if conversion is rare, and your users are experts, conversion between Unicode and one of the three popular codes used in Japan is necessary. If conversion is frequent, and your users are novices, conversion to all those codes are necessary. What is esp. nice is a program that can detect the code of an input file. The names of the codes are JIS, S-JIS (also called Shift-JIS), and EUC. For more information, consult the book by Ken Lunde that will appear soon (I think some preliminary information is available on the ftp server of insoft-l.) [Note from Jeff: Site: rhino.cis.vutbr.cz Directory: pub/lists/insoft-l/doc/unicode ] Sender: Steve R. Billings (srb@world.std.com) Our 2 Japanese customers require SJIS which (I just learned from this newsgroup) is contained as a subset within Unicode. So, I believe the answer to your question is "yes", but our customers have not signed off on it yes. P.S. If you learn of any reason why a Japanese customer would NOT be satisfied with Unicode, I (and probably many others in this newsgroup) would be very interested in hearing about it. -------------------- end -------------------- --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Thursday, 27 January 1994 00:48:42 UTC