RE: Compatibility with Unicode from Masataka Ohta on 1993-11-03 (ietf-charsets@w3.org from October to December 1993)

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Date: Thu, 04 Nov 1993 03:48:42 +0900 (JST)
To: dank@blacks.jpl.nasa.gov
Cc: ietf-charsets@INNOSOFT.COM
Message-id: <9311031848.AA07703@necom830.cc.titech.ac.jp>

> You've probably already discussed this, but ...

Not in detail in the mailing list. So, let me explain.

> Several major OS's appear to be comitted to using 16-bit Unicode.
> Therefore the larger character set this list is discussing must be 
> compatible with Unicode at least to the degree that it can be displayed 
> in degraded form on Unicode PC's, and allow Unicode text to be mapped in.
> Preferably this should be possible without huge lookup tables.

I can understand what you want to say, though the size of lookup table
for your 18 bit code in 32 bit space (256K*4B=1MB) is not so huge compared
to the size of a single low resolution font set of 64K characters
(16dot*16dot*64K=2MB).  Here, the fact that UNICODE needs additional
LARGE table and fonts to display combining characters (rules for synthesis
of some languages are not simple and needs separate fonts) is neglected,
so plllease don't say that negligibly small number of Latin characters
can be represented with 5 dot by 7 dot.

But many poeple can't understand even such simple calculation, simple
rule is better to avoid stupid debating.

> One solution would be to keep Unicode's Han unification, but add two bits
> above and beyond Unicode to indicate the language of each
> Han character.

Your idea is quite good, but...

Considering that the use of Han characters have been developped in
at least 5 contries:

	Mainland China
	Taiwan
	Japan
	Korea
	Vietnum

in thier own ways, addition of 3 bits are necessary.

Also, contrary to its marketing hype, UNICODE does not contain all the
Han characters (only about half). To add them and other possibly missing
non-Han characters later, we need additional 1 bit.

And, then, if we want to support bi-directionality without introducing
long term states (that is, to avoid escape sequences, designation,
announcers and such), we need 1 bit more to encode the bi-directionality
state into character (or some encoding unit similar to character, if you
have your own definition of a 'character') code.

> When mapping to Unicode, these two bits could simply
> be thrown away, just as upper and lower case ASCII are displayed on
> an old upper-case only terminal by throwing away a single bit.

That's why I proposed ICODE which is a 21 bit extension of UNICODE.

							Masataka Ohta

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Wednesday, 3 November 1993 10:53:55 UTC