Mailing list INSOFT-L

Sorry if this is old news to you folks, but I just stumbled on
a mailing list of interest to the ietf-charsets group.
Here is an example posting pulled from their archive.
This is the same group that is starting up the journal _I18N_.
They are looking for a new moderator, by the way.

p.s. This article contains a good summary of the Unicode vs. Japan
situation, as seen by the West.  Mr. Ohta's position is apparantly
shared by many in the East, although they express it less forcefully.
The key worry is achieving pleasing display of mixed Chinese/Korean/Japanese
text, which requires encoding language (or equivalently, font).
The Japanese position is that this should be covered by any character
set standard; the West's position is that this is external to the
standard.  Another problem is that the Han unification was not done
with the politeness needed for success in Japan.
Mr. Ohta, am I close?

- Dan Kegel (dank@alumni.caltech.edu)

----------------- example from insoft-l@cis.vutbr.cs ----------------- 

>From: Bowyer Jeff <jbowyer@cis.vutbr.cs>
Subject: Consolidated Answers for "Unicode and the Japanese Market"
To: insoft-l@cis.vutbr.cs
Date: Fri, 13 Aug 1993 14:03:05 +0200 (MET DST)
Reply-To: jbowyer@cis.vutbr.cs
X-Mailer: ELM [version 2.4 PL20]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 25207     

Here are the consolidated responses to the original questions from
Bob Peterson ("gemgrp::peterson"@tle.enet.dec.com):

     I have heard that while Unicode contains Kanji, it does so in a way
     that is not acceptable to the Japanese market, and hence was not
     approved by them in recent votes.  Does this mean a product that
     supports Unicode alone wil not be as acceptable as a product that
     handles Japanese character sets using other encoding methods.  

     What one other encoding should a product use in addition to Unicode
     in order to succeed in Japan?  Or will Unicode be adapted to succeed?
     How soon?

     (It is bad enough we will probably make two versions of some products,
      8 bit ISO-Latin and Unicode, but if we have to maintain a 3rd version
      this becomes lunacy).

Jeff
INSOFT-L List Manager

*========================================================================*
 Jeff Bowyer                         EMail: 
 Computing Center		     jbowyer@cis.vutbr.cz
 Technical University of Brno
 Udolni 19, 602 00 BRNO
 Czech Republic
*========================================================================*


Sender: Glenn Adams <glenn@metis.com>
        Technical Director, Unicode Consortium

Such an opinion that "[Unicode] is not acceptable to the Japanese market"
is quite premature, don't you think?  How many Unicode products are on
the market in Japan?  [I know of one, the Kanji version of GO's Penpoint
operating system; I'm sure others are being developed now.]  For a given
Unicode product on the market in Japan, in what way is it "not acceptable"?

It is a bit pointless to make broad statements like this without providing
a few facts; such as, what specifically is it about Unicode that makes it
unsuitable for use in Japan?  Whether the Japanese voted yes or no (on
ISO/IEC 10646, and not Unicode) is not relevant to the feasibility of
Unicode (and ISO/IEC 10646) in Japan.

I will offer some comments on the feasibility of Unicode in Japan:

1. Unicode is only a character set, period.  It is not a font.  It
is not an I18N subsystem.  It is not a library which applications may
use to implement I18N solutions.  It is just a character set.  Its
purpose is to represent character data of all languages (and not to
meet the special needs of a particular language over and above the
task of representing its character repertoire).

2. Unicode contains all of the characters in the three JIS standards
JIS X 0201, JIS X 0208, and JIS X 0212.  In addition, it contains many
CJK ideographs which are not in these character sets but which are used
as 'gaiji' characters by users in Japan.  In order to convert between JIS
data (of various encoding styles such as EUC-JP, ISO 2022J, Shift JIS,
etc.) and Unicode, all that is required is a 2-way mapping table for
each of the three JIS standards.  The mapping table for JIS X 0208
requires approximately 26Kbytes to support mappings in both directions.
Such a mapping loses no data; i.e., round-trip conversion is possible
without losing any information.

3. Unicode does not prescribe the visual appearance of a character; therefore,
in displaying Japanese text encoded in Unicode, it would be quite proper to
use a JIS 0208 or 0212 encoded font, or a Japanese font of any other encoding
for that matter.

4. Unicode does not prescribe the sorting order of a character; therefore,
in sorting Japanese text encoded in Unicode, it would be quite proper to
use a JIS oriented sorting weight table if a user expects text to be ordered
according to JIS, or by any other acceptable sorting order desired by the
user.  [One should recognize that a sorting order based on JIS order is
not entirely acceptable in many cases either; e.g., two completely separate
sorting methods are used in JIS X 0208, one based on phonetic order (for
level 1 kanji), the other based on radical/stroke order (for level 2 kanji);
JIS X 0212 (level 3 kanji) also uses the radical/stroke technique.  There
have been a number of articles published in Japan about problems with
depending on the order proscribed by JIS for performing sorts which meet
user expectations.]

Given the above facts, one can only conclude that there is no technical
reason whatsoever that a given piece of software using Unicode couldn't
meet the needs of Japanese users according to the capabilities provided
by a character set.  Of course, Unicode, as a character set, needs software
to make it useful.  It is that software which must meet the needs of
Japanese users; Unicode by itself does not prevent such needs being met
as well as existing JIS based systems.  If anything it facilitates creating
better, more encompassing Japanese software systems since it incorporates
all JIS sets into a single fixed width character set.  This by itself
will help the development of Japanese software by providing a way to
migrate away from the stateful, multibyte encoding systems such as
ISO 2022 or EUC-JP.

You might wonder why I haven't mentioned anything about the unification
of Chinese, Japanese, and Korean ideographs in Unicode.  The reason I
didn't mention it is because it doesn't have a bearing on using Unicode
to represent Japanese text.  Nor does it have a bearing on using Unicode
to represent Chinese text.  The only time it does have a bearing is
when one wants to represent mixed CJK text.  In such a case, if one wants
to employ a different font to display the same character in a way that
is acceptable to a Chinese reader versus a Japanese reader, then it will
be necessary to augment such a representation with font tags (or with
language tags) so as to enable selecting the correct font.  For a non-
Unicode system to do this now would also require doing the same, or,
alternatively, mixing multiple character sets such as JIS X 0208, GB 2312,
and KS C 5601, and then selecting different fonts based on the character
set.  Such a technique requires tagging the character set of a given
character from which the language or font can be inferred.  In the case
of Unicode, one would use a single character set, Unicode, and then
employ a font or language tag explicitly.  The difference between the
two tecniques is neglible.  However, using Unicode gives one the ability
to not require processing multiple character sets at the same time.
This, after all, was perhaps one of the most important goals of Unicode.

  Does this mean a product that supports Unicode alone will not be as
  acceptable as a product that handles Japanese character sets using
  other encoding methods.  

The acceptability of a product will be based on whether it meets a user's
needs, not on what character set it uses (assuming that the given character
set meets the character repertoire needs in the first place, which Unicode
does).

  What one other encoding should a product use in addition to Unicode in
  order to succeed in Japan?  Or will Unicode be adapted to succeed? 

  How soon?

No other encoding is needed (internally).  Of course and application will
need to be able to import/export text in other existing character sets,
but it needn't use them internally.

Unicode will not be adapted to meet non-existent needs.  If the Japanese
(or anyone else) does identify real needs, then they will certainly be
evaluated.  The Unicode consortium welcomes public participation in
its technical discussions and at its meetings.

Keep in mind that Unicode, Version 1.1, is now synchronized with ISO/IEC
10646-1:1993, developed by ISO/IEC JTC1/SC2/WG2.  Unicode is essentially
a usage profile of ISO/IEC 10646-1:1993 (UCS-2, Level 3).  Because ISO/IEC
10646 is a very important international standard, the Japanese development
computer and communications industries will certainly take it into account
as a legitimate standard.


Sender: Asmus Freytag <asmusf@microsoft.com>

| I have heard that while Unicode contains Kanji, it does so in a way 
| that is not acceptable to the Japanese market, and hence was not
| approved by them in recent votes.  Does this mean a product that
| supports Unicode alone wil not be as acceptable as a product that
| handles Japanese character sets using other encoding methods.

To see why this is a strange question, let's ask it about English.  What
about the acceptibility of products for English that support only Unicode
(and not ASCII).  Well, first off, how would I (the user) know it didn't
handle ASCII?  Only if it fails to import data files in ASCII, in other
words, "if it is not compatible".  Nobody wants incompatible products, so
new prodcucts ALWAYS need to be prepared to deal with existing data.

| What one other encoding should a product use in addition to Unicode 
| in order to succeed in Japan?  Or will Unicode be adapted to succeed?
| How soon?

The answer to the first half, depends on the user.  In a PC environment
in Japan, Shift-JIS (with vendor extensions) is the most widely used
native character encoding.  A new piece of software supporting Unicode
must be able to accept existing data files in Shift-JIS.

| (It is bad enough we will probably make two versions of some products, 8 bit
| ISO-Latin and Unicode, but if we have to maintain a 3rd version this becomes
| lunacy).

The choice is more one like this:  Do I make a separate 8-bit (Latin-1)
and Shift-JIS version of my application, or do I create a single Unicode
version (with appropriate compatibility with existing data, e.g.  by
one-time or on the fly conversion).

In case of Windows NT, Microsoft decided to only create one version of
the core operating system.  The kernel, file system, etc.  all support
exclusively Unicode.  At the same time Windows NT for Japan will be a
compatible player in the Shift-JIS environment:  It will run Shift-JIS
Windows and DOS applications out of the box, the file system is able to
read and write directory information on disks that were formatted using
Shift-JIS, etc.  So for the user in Japan interested in doing just what
he or she has always been doing, there will be no observable
change--despite the fact that the system supports _only_ Unicode in its
bowels.

The change is visible for Microsoft, where adapting the system to Japan
is taking less than half the amount of time it took for Windows 3.1, and
for application vendors who now, for the first time, have the choice of
writing only _one_ version of their application for NT, namely the
Unicode version.  This Unicode version will run without modifications
(other than translating the user interface) on the Japanese version of NT
as well as any other version of NT.

It is this simplification, that will drive the acceptance of Unicode as a
delivery vehicle in the long run, especially as the vast majority of
packaged software is createdd by vendors who sell software globally.

I have used the term "delivery vehicle" deliberately.  Nobody expects
that overnight all existing data (and host computers, and...)  will have
transformed themselves.  So what we are looking for is a vehicle that
lets us provide software cost effectively that delivers functionality
into markets with very different legacy character sets.  Unicode is the
delivery vehicle for that kind of future software.


Sender: Chiaki Ishikawa (pmcgw!personal-media.co.jp!ishikawa@uunet.UU.NET)
        Personal Media Corp.
	Tokyo, Japan

>> I have heard that while Unicode contains Kanji, it does so in a way that is not
>> acceptable to the Japanese market, and hence was not approved by them in recent 
>> votes.  Does this mean a product that supports Unicode alone wil not be as 
>> acceptable as a product that handles Japanese character sets using other 
>> encoding methods.  

There are few things that you ask or mention as you heard it on the
grapevine.

(1)  "I have heard that while Unicode contains Kanji, it does so in a way that is not
      acceptable to the Japanese market, and hence was not approved by
      them in recent votes."

Now, I can't speak for all the Japanese programming community. 
Yes, there is a trend to move against Unicode.
Basically the objection seems to come from 

(a) the rather uncalled for (depending on your point view, it WAS called for) `unification'
    and

(b) collision between Unicode and then (or previouslY) on-going JIS
    efforts for multibyte charcter code standardization efforts.

How strong is the opposing movement?
The part of Japanese computer community who are unhappy with the
Unicode standard already formed a small comittee within an industrial
association to study the standard BEYOND Unicode (I think they talk
about 4 bytes code, but I am not sure).

Their attitude seems to be to silently ignore Unicode and move on the
the NEXT generation of standard when the software/hardware and market
maturity reaches the point where multi-language character handling is
a MUST for the majority of computer platform in the next several
years.  

Please not that since Unicode is already (part of) ISO standard, that
there will be a corresponding Japanese standard in the not so distant
future. However, standard is only valid as long as the user/vendor
stick to it. If the standard is not followed by the user/vendor
community very well, it will die of obsolescence. 

Whether the ignorance tactics of Unicode opponents succeeds depends
upon the success of Unicode in Japan, and I think it is directly
related to the success of Windows NT.

[I myself think (a) unification was NOT carried out with all the grace
and cultural acceptability to the taste of Japanese computer
community.  I am sorry that I don't have time for discussion right
now, am too busy doing work for a project that has to be finished
within this year. I thank Glenn of Unicode Consorcium who seems to
read this mailing list to have enlightened me about Unicode early this
year when I had time to post to comp.std.internat. He might be
amused/shocked to learn of the small committee of which I mention
above. Somehow the reading room magazine rack of my office had the
literature from the comittee. It is full of mumbo jumbo of ABC soup of
standard acronyms and hard to understand their own opinion. But I
think my description of their intention is accurate enough. You might
want to monitor comp.std.internat newsgroup for occasional heated
arguments therein regarding Unicode.]

(2)   "Does this mean a product that supports Unicode alone wil not be as 
	acceptable as a product that handles Japanese character sets using other 
	encoding methods?

Yes, definitely. If you are not prepared and your product doesn't fly,
don't blame it on the Japanese distribution system :-)

	"What one other encoding should a product use in addition to Unicode in order to 
	succeed in Japan?  Or will Unicode be adapted to succeed?  How soon?"

My personal opinion is that the success of Unicode hinges on the
acceptance of Windows-NT in Japan. This is because Microsoft
Windows-NT is using Unicode. It is anyone's guess whether WNT will be
THE desktop on which multi-lingual word processing, for example, will
take place in the next few years.  (Aside from Intel platform, DEC
also promotes Alpha PC which runs Windoes/NT as well OpenVMS, and
Ultrix. But, DEC being in the low profile it is in in terms of
commercial success, it is not followed closely by majority of the
Japanese market.)

It will be a couple of years whether Windows NT will be a success in
the sense of Windows's success. So we don't know whetehr Unicode is
widely adopted or not in Japan.  In the meantime, you can't come to
Japan and expect reasoanble marketing success unless your system
supports

- (MS-DOS): Shift-Jis Japanese code system
	Oh, incidently this is the character code system used by
	Windows application, too.

- (UNIX): EUC ... Japanese Extended Unix Code.

Some WS vendors use Shift-Jis for better interoperability with DOS
application, but their market share is minor. HP is among them.  But,
they seem to have provide EUC support nowadays. Sun is the market
leader and they use EUC, period.

[and if you need to talk to old mainframe or yours is more
 concerend with communication software - (JIS): There are
 several JIS standards. You will know what you will need if your
 software is geard to communication.]

So, the choice of the character code system depends upon which target your
product is meant for. Ask your local representative of your target
hardware company which code they use in Japan.

Currently, other than Microsoft Win/NT (and possibly some Japanese
makers who might bundle Win/NT in Japan), I haven't heard of ANY
Japanese computer manufacturer who supports Unicode today or have
announced to support Unicode anytime soon. (Correct me if I am wrong.)

Japanese mainframe vendors such as Nihon IBM, Fujitsu, Hitachi, NEC
have character code system which is an extension of JIS to support
unusual (slight variation) of characters for proper nowns: name of
people and places.  They add characters to base JIS standard they have
adopted.

This and other compatibility issues were being addressed with
current and then on-going JIS standard activity to codify
character code system at ISO level when UNICODE preempted, so to
speak.
	I think an early JIS delegate to ISO committee was somewhat
taken aback to learn recently that Unicode people are now consiering
the extension to the current Unicode to accommocate more characters by
clever encoding (allowing more than 2 bytes for a character) because
he seems to have felt that if we would go above 2 byets there were no
needs for "Han Unification" at all to begin with!

>> (It is bad enough we will probably make two versions of some products, 8 bit 
>> ISO-Latin and Unicode, but if we have to maintain a 3rd version this becomes 
>> lunacy).

You can say that again. I think there are people who are driven up the
walls with the introduction of Unicode.

Me? Although I think the UNICODE unification was a lousy idea, I will
go as the wind blows. You can't argue with the majority of your target
audience as far as character set issues are concerned.
For example, I despise S-JIS and don't want to use it , but what can I do otherwise
if I want to use DOS PC in Japan?

One consolation is there is one to one map between
EUC <-> S-JIS <-> JIS (and Unicode).
So if you have a conversion filter built once, at least the conversion
shouldn't be THAT hard. 


Sender: David Goldsmith <David_Goldsmith@taligent.com>
        Taligent, Inc.

This question was forwarded to the 10646 mailing list. John Jenkins of
Taligent collected the replies there, and I am forwarding them on to this
list.

---------------
From ISO10646@JHUVM.HCF.JHU.EDU  Wed Oct 14 14:52:28 1998
Date:	Mon, 9 Aug 1993 08:50:38 EDT
Reply-To: Multi-byte Code Issues <ISO10646@JHUVM.HCF.JHU.EDU> 
Multi-byte Code Issues <ISO10646@JHUVM.HCF.JHU.EDU> 
Sender: Edwin Hart <HART%APLVM.BITNET@cunyvm.cuny.edu> 
Subject: Re: Does Unicode satisfy Japanese market?
X-To:	insoft-l@cis.vutbr.cs, Multi-Byte Code Issues <ISO10646@JHUVM.BITNET>
To: Multiple recipients of list ISO10646 <ISO10646@JHUVM.HCF.JHU.EDU> 

Glenn Adams gave an excellent response to this question. 

I represent a large set of IBM customers to the U.S. technical standards
committee for codes and character sets. While I cannot speak for Japanese
customers, I have a perspective on U.S. and some Canadian customers that may
or may not be valid for other customers. 

1. Customers do not care about how the information is coded. (That is, unless 
the way the information is coded causes something like printing/displaying or
communication to fail. Even then, they do not care about the encoding; they
want to know how to fix the problem with the minimum of technical details.)
2. Customer care about the applications that they use. 3. With respect to
coding, customers have these types of questions about the 
applications:
a. Are the characters I need available?
As Glenn said the answer is yes because Unicode/10646-1 contains the Kanji
characters from the 3 Japanese standards. b. How do I enter them?
This is not a coding issue.
c. Do the characters display and print correctly? 
In Japan, this is a font issue and Japan has fonts for the JIS standard
characters in Unicode/10646-1. I would presume that anyone who wants a product
to be successful in Japan will use the Japanese fonts. Everything I have heard
indicates that this is one of THE major issues for the Japanese.
d. Can I correctly communicate the information in the coded characters 
to others?
This is a coding issue. However, as Glenn discussed, other information (such
as the font/country) may also be required to display the correct shape for the
character.
e. Do the characters sort correctly?
This is not a coding issue. It is an internationalization (I18N) issue. The
sorting software needs to account for culturally-correct sorting.

In summary, the issue is not how the characters are coded in an application
but how well an application meets the customer's expectations and needs.
Unicode and ISO/IEC 10646-1 are part of the solution to meeting customer needs
but meeting the customer needs requires much much more that Unicode and 10646.

Ed Hart

From ISO10646@JHUVM.HCF.JHU.EDU  Wed Oct 14 14:52:29 1998
Date:	Tue, 10 Aug 1993 11:39:18 +0200
Reply-To: Multi-byte Code Issues <ISO10646@JHUVM.HCF.JHU.EDU> Sender:

Multi-byte Code Issues <ISO10646@JHUVM.HCF.JHU.EDU> 
From: Andr'e PIRARD <PIRARD%BLIULG11.BITNET@cunyvm.cuny.edu> 
Organization: University of Liege (Belgium), SEGI (Computing Center) 
Subject:Re: Does Unicode satisfy Japanese market?
To: Multiple recipients of list ISO10646 <ISO10646@JHUVM.HCF.JHU.EDU> 

On Mon, 9 Aug 1993 08:50:38 EDT Edwin Hart said: 
33. With respect to coding, customers have these types of questions about the 
aapplications:
>d. Can I correctly communicate the information in the coded characters 
tto others?
>This is a coding issue. However, as Glenn discussed, other information (such
as the font/country) may also be required to display the correct shape for the
character.

I still wonder what's the defined encoding to send Unicode on communication
paths, and whether Unicode hosts will be compatible with ISO 10646 hosts in
this respect.

In other words: "Does Unicode satisfy the _whole_ market?" Or: "Is it just a
font or more?"

It must be realized that, from a communication perspective, what's happening
inside a machine (what code it uses) is of little concern as long as each
appears others as if it were using a common code by using the same one (and
same encoding) in communication. Bewaring that communication also means mag
tapes and CD ROMs. 

The standards for data communication are the first things to consider. It's a
time saving to design the machine internals second, accordingly. And it's a
time saving to define global communication encoding standards rather than
tackle the problem anew for each application protocol (e-mail, file transfer,
terminal mode, database aso...). 

Just Latin-1 communication shows that immediate interest has been a mistake of
the past.
And communication is now a fact.
For everybody, I hope.



Sender: John Finlayson (johnf@findog.HQ.Ileaf.COM)


The most common code used in Japan is Shift-JIS, which is used by
PCs, Macs, and many workstations, although there is a trend toward
EUC (Extended UNIX Code) on Unix.

For more info on Japanese text encoding, I recommend Ken Lunde's
japan.inf, available (last I checked) via anonymous ftp from
ucdavis.edu (128.120.2.1) in the pub/JIS directory.


Sender: Martin "J." Duerst <mduerst@ifi.unizh.ch>
        Institute for Informatics
	University of Zurich

As far as I have heard, the reason that the Japanese voted No has to do
with tecnicalities of the standardization process. It did not mean that
Unicode was disapproved altogether by the Japanese.
A second point is that not all Japanese are equally happy with Unicode,
but then, probably not all Americans share exactly the same view about
it, either.
What is more important for you is the fact that although Unicode may
become the future standard all over the World (and so in Japan, too),
it is not yet that common anywhere. Thus conversions from and to Unicode
are a must in every Unicode product, esp. if it works in an open environment.
That then means that if conversion is rare, and your users are experts,
conversion between Unicode and one of the three popular codes used
in Japan is necessary. If conversion is frequent, and your users
are novices, conversion to all those codes are necessary. What is esp.
nice is a program that can detect the code of an input file.
The names of the codes are JIS, S-JIS (also called Shift-JIS), and EUC.
For more information, consult the book by Ken Lunde that will appear
soon (I think some preliminary information is available on the ftp server
of insoft-l.)

[Note from Jeff:

     Site:      rhino.cis.vutbr.cz
     Directory: pub/lists/insoft-l/doc/unicode
]


Sender: Steve R. Billings (srb@world.std.com)

Our 2 Japanese customers require SJIS which (I just learned from this newsgroup) is contained
as a subset within Unicode. So, I believe the answer to your question is "yes",
but our customers have not signed off on it yes.

P.S. If you learn of any reason why a Japanese customer would NOT
be satisfied with Unicode, I (and probably many others in this newsgroup)
would be very interested in hearing about it.

-------------------- end --------------------

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Thursday, 27 January 1994 00:48:42 UTC