RE: Thoughts about characters transmission from Luc Rooijakkers on 1993-07-22 (ietf-charsets@w3.org from July to September 1993)

From: Luc Rooijakkers <luc@opus.spc.nl>
Date: Thu, 22 Jul 1993 09:25:11 +0100
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Cc: ietf-charsets@INNOSOFT.COM
Message-id: <9307220825.AA19495@opus.spc.nl>

Masataka Ohta writes:

> But, I have disappointed that NET-TEXT does not solve the unfairness, the
> currently recognized issue of UTF2, at all.

There are some errors in the NET-TEXT message with regard to UTF-2
sequences of more then 3 bytes, but the basic premise was to remain
compatible with UTF-2. This may or may not be a worthwhile goal,
as Otha-san pointed out rightly, but I believe NET-TEXT is pretty much
the minimal extension you can make while still remaining compatible.
Anyone want to comment on this?

> With additional 2 single octet encoding and 60 two octet encoding at most,
> you can't encode non-European characters as efficient as the European
> ones.

Note that any non-ASCII character requires at least 2 bytes,
in any applicable encoding. If I understand you right, you would like
2-octet representations for all of GB, JIS and KSC, right? While this is
theoretically possible (these are all 94^2 charsets and hence require
3 * 94^2 = 26508 combinations), I don't see any solution, since there is
at most 7 bits per octet available (octets < 128 should occur only when
representing the corresponding ASCII character). So, would you be
willing to accept 3-byte encodings for these? 

Also, I'd like some comments from other people as well.

> The article also contains imcomplete and incorrect summary of the bof.

Incomplete, yes, but could you please explain to me what was incorrect?

--
Luc Rooijakkers                                 Internet: lwj@cs.kun.nl
SPC Company, the Netherlands                    UUCP: uunet!cs.kun.nl!lwj

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Thursday, 22 July 1993 00:44:00 UTC