RE: Compatibility with Unicode from Masataka Ohta on 1993-11-05 (ietf-charsets@w3.org from October to December 1993)

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Date: Sat, 06 Nov 1993 03:05:38 +0900 (JST)
To: dank@blacks.jpl.nasa.gov
Cc: ietf-charsets@INNOSOFT.COM
Message-id: <9311051805.AA16889@necom830.cc.titech.ac.jp>

> I wrote:
> > [Because Unicode is being incorporated into major OS's,

If have interpreted "major OS's" as OSes for PCs which do have
CRTs.

Was that incorrect?

> There is a lot of software which deals with text but doesn't need to
> display it. For instance, Perl programs (e.g. several of the whois++
> servers) operate on text without managing its display on a CRT.
> It would be nice if one could write a function to convert to Unicode
> in Perl without having to lug a megabyte table around.  It certainly
> would make things easier for the little guy.

The table could be read (or mmapped) from a shared file. Or, are you saying
we should save 1MB of memory?

> > But many poeple can't understand even such simple calculation, simple
> > rule is better to avoid stupid debating.
> It's always better to avoid stupid debating!  But the people on this
> list aren't stupid, even if they do have trouble communicating sometimes.

People on this list? No. We do have trouble communicating now.

When I designed ICODE, this list was not yet formed. I reffered people
in general.

> > [ My proposed ICODE is roughly UNICODE extended with five more
> >   bits to handle five flavors of Han (3 bits) plus missing Han characters
> >   (1 bit) plus bidirectionality (1 bit). ]
> Did I summarize you correctly?

Mostly. What is missing in UNICODE is not limited to Han.

> This sounds good- are you really saying that one can convert half of the ICODE
> characters to UNICODE by throwing away the upper five bits, at the cost
> of losing information about the 'dialect' of Han and the 'bidirectionality'?

No. UNICODE for some languages, say, Arabic, has other problems such
as lack of causality. The problem could be taken care of by registering
"missing" (actually not missing because it is composable) characters.

> And that UNICODE can be converted to ICODE by adding five bits of zeroes?

Yes, if you can figure out how the five bits are.

> I must confess, I am still uncertain what the 'bidirectionality' information
> is that you want to encode with that extra bit is.  Is it the direction
> (left->right or right->left) that the character is normally written with?

Yes.

Each character has its own natural directionality.

You don't have directionality issues if all the characters in a line
have the same directionality.

If, with ICODE, the directionality bit of a character is 1, it means
that directionality of the character is reversed.

Then, you can let all the characters in a line have the same
directionality and directionality issues disapper. Note that
characters with unnatural directionality must be spelled backword.

If you have any other schem which does not need long term
state, please let me know.

						Masataka Ohta

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Friday, 5 November 1993 10:10:28 UTC