Re: Don't we need a standard way to represent language in Unicode? from Masataka Ohta on 1994-02-11 (ietf-charsets@w3.org from January to March 1994)

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Date: Sat, 12 Feb 1994 01:16:42 +0900 (JST)
To: dank@alumni.cco.caltech.edu (Daniel R. Kegel)
Cc: ietf-charsets@INNOSOFT.COM, insoft-l@cis.vutbr.cz, ISO10646@jhuvm.hcf.jhu.edu
Message-id: <9402111616.AA20816@necom830.cc.titech.ac.jp>

> Mr. Freytag pointed out that, although Microsoft is devoted to 16 bit Unicode 
> for Windows NT, and will not switch to a 32 bit encoding, users can mix 
> fonts in Rich Text Format documents to achieve proper display.

So, Microsoft Word can do so. But I want to use multilingual plain text.

> An NT programmer at Caltech pointed out that fonts in NT can be tagged with 
> language, so language can (at least potentially) be deduced from the font 
> being used, and a font can be chosen that is appropriate for a language.

As long as you use richtext, yes.

> I hope this will be the case in practise.

The problem is that we, in practice, need multilingual plain text.

Not everybody in the world use Microsoft Word.

> This means that Windows-NT should be able to interoperate with the 32
> bit option of ISO10646, with a little work;

Yes, with a little profiling, which means a lot of pain and incovenience
both for vendors and for end users. Of course, multi lingual plain text can't
be handled.

> (I am still curious as to whether the 32 bit option of ISO10646 will
> start out as Unicode plus two bits to indicate language, e.g.
> plane 00 = Unicode, plane 01 = Chinese subset of Unicode Han, plane 02 = 
> Korean subset of Unicode Han, plane 03 = Japanese subset of Unicode Han.

Basically, you are right. But don't forget 30,000 Han characters (in each
coutry) which are not included in the currnent Unicode (1 more bit) and
the entire falimy of Vietnamese Han (1 more bit) and stateless
bi-directionality support (1 more bit),

Thus, you, at least, need 21 bits. See my paper on ICODE presented at
JWCC 93.

						Masataka Ohta

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Friday, 11 February 1994 08:22:32 UTC