wchar_t & Unicode in Word 97(Re: New work item for XML group ?)

(If this is off topic, please forgive me)

Gavin Nicol wrote:

  Some Unix systems define wchar_t to be 32 bits.

BYTE magazine in March said that some UNIXs use 8bits for wchar_t
(surely this is wrong!)
Anyway, the fact that other standards making bodies may have failed to
adequately define what
they mean by a wide character surely should make us want to be more

As a side issue, but still on the subject of the desirability of
Unicode, here is part of a  posting today
to a mail group on Asian documents:
Christian Wittern writes:

You might be aware of this, but it is completely new to me.

I recently had a look at Word 97. I played with (late beta Versions)
of the English and Chinese programs on English, Japanese and Chinese
Windows  95 as well as German Win NT 4.0.

The internal file format of Word 97 is now in Unicode. There seems to
be no difference in Fileformats between Western and East-Asian
versions. Now, finally it is not longer necessary to keep copies of
the program in two, three or four different languages and operating
systems, just to accomodate the need of processing multiple Asian
languages. It is now possible to create a file, say, on Japanese
Windows in English Word97  and open it in Chinese Windows and Chinese
Word 97 and the Characters will display correctly without the need of
any conversion! (Since the standard font names are different, there
might be a need to set up the font mapping for the Japanese to the
Chinese fonts - this is done once and forever).

Although the file format is Unicode, the CJK glyphs are still seen
through the limitations of the national encodings. This means that
Word 97 running on Chinese Windows will only display Characters from
Big5, Japanese Word only those from JIS; Characters outside of these
ranges are displayed as a 'missing glyph' question mark, but are not
in any way distorted, deleted or mixed up. This is still true even if
the font used contains all the 20000+ CJK glyphs from the Unicode
standard, like for example Bitstreams Cyberbit (free download from
http://www.bitstream.com). This situation only changes when Word 97 is
running on NT: With the proper font installed, it will happily display
all CJK glyphs thus finally, after *years* of pain, making it possible
to mix Asian and European text, with Sanskrit and whatever in one
document.  To some extend, this is even possible in English WIndows
95, where it is possible to install the fonts from the Internet
Explorer Language Pack (www.microsoft.com and all over the planet,
search for ie3lpktw.exe, ie3lpkcn.exe and so forth).

Even the English version of Word97, running in a CJK environment, now
is smart enough to register a switch of the Keyboard from alphabetical
to ideographic input and will adjust the font accordingly.

It seems the day finally has come where we can switch our
textprocessing to Unicode and forget about things like Big5, JIS, KSC
and the like to concentrate on the work we originally planned to do
with the help of a computer.

someone then quibbled about input methods ...
Christian Wittern writes:

Well, maybe I did not express myself very good. What I wanted to say
is, due to some "feature" in Fareast Windows 95, Asian Fonts can only
installed with one Asian language flag, Japanese OR Chinese (Big5) OR
That is even a font like Bitstreams Cyberbit, that contains *all *
20000+ CJK ideographs, will have to be installed as Japanese or
Taiwanese. This will cause the OS and Word97 to filter out those
characters that do not belong to the specified language. So although
the encoding is unified to CJK, you still need extra fonts for the
East Asian regions and some areas, like JIS 212 are still not

Of course all this applies to Win 95 in the different Fareast
versions, Win NT 40 will happily display all the Kanji you might ask


(B.t.w, by "all the kanji you can ask for" , Christian means "all the
kanji that are
in Unicode".  He has been working on a project at a Kyoto university
with over 48 000 kanji, so he is well aware of the need for ISO 10646 to

be extended! )

Rick Jelliffe