RE: Thoughts about characters transmission from Masataka Ohta on 1993-07-22 (ietf-charsets@w3.org from July to September 1993)

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Date: Thu, 22 Jul 1993 17:32:01 +0900 (JST)
To: luc@opus.spc.nl (Luc Rooijakkers)
Cc: ietf-charsets@INNOSOFT.COM
Message-id: <9307220832.AA20651@necom830.cc.titech.ac.jp>

> > But, I have disappointed that NET-TEXT does not solve the unfairness, the
> > currently recognized issue of UTF2, at all.
> 
> There are some errors in the NET-TEXT message with regard to UTF-2
> sequences of more then 3 bytes, but the basic premise was to remain
> compatible with UTF-2. This may or may not be a worthwhile goal,
> as Otha-san pointed out rightly, but I believe NET-TEXT is pretty much
> the minimal extension you can make while still remaining compatible.

Do you know what is the currently recognized issue?

> Anyone want to comment on this?

IUTF is upper compatible to the current UTF-2.

As plan 9 foolishly uses 16 bit wchar_t and MSDOS also foolishly
use 16 bit byte, compatibility to 16 bit code is more than enough.

> > With additional 2 single octet encoding and 60 two octet encoding at most,
> > you can't encode non-European characters as efficient as the European
> > ones.
> 
> Note that any non-ASCII character requires at least 2 bytes,
> in any applicable encoding. If I understand you right, you would like
> 2-octet representations for all of GB, JIS and KSC, right?

Wrong, of course. That's too far-east-centric to be fair.

> While this is
> theoretically possible (these are all 94^2 charsets and hence require
> 3 * 94^2 = 26508 combinations), I don't see any solution, since there is
> at most 7 bits per octet available (octets < 128 should occur only when
> representing the corresponding ASCII character). So, would you be
> willing to accept 3-byte encodings for these? 

No. Not all of those. That's a waste of space.

For JIS, for example, Hirakana, Katakana and some frequently used
punctuations, at least, and some frequently used Japanese Hans (about
1000, at most), optionaly, should be encoded with two octets.

						Masataka Ohta

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Thursday, 22 July 1993 01:36:31 UTC