RE: BOF from Masataka Ohta on 1993-08-24 (ietf-charsets@w3.org from July to September 1993)

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Date: Tue, 24 Aug 1993 14:45:41 +0900 (JST)
To: jerman-blazic@ijs.si (Borka Jerman-Blazic)
Cc: wg-char@rare.nl, ietf-charsets@INNOSOFT.COM
Message-id: <9308240545.AA05798@necom830.cc.titech.ac.jp>
> I went very quickly over Ohta san comments on the BOF Minutes and find out 
> that even he agree on the common issues to be worked out.

Even me? Good joke. Anyway,

> >> Identity for encoding and  decoding  which  he  >  The
> >> discussion  showed  that  the  proposed solution is not in the general
> > >stream of the development of the  standard  character  set  codes  and
> > >their  applications  in  the  computing  systems.
> 
> >"the general stream of the development"? What's that? It was only that my
> >proposal was not compatible with UCS4. It is, now.
> 
> Do you know any other scheme of development of character sets code on
> international level? I do not.

	Full ISO 2022
	EUC
	compound text of X consorcium
	ISO-2022-JP-2 (recently deveolpped and will soon be announced)

are ISO 2022 based character sets.

Also, with charset scheme of MIME, various non-2022 codes are defined
in RFC1345 (they are not currently valid MIME names, but it could be if
some desires so). Several other are developped in various countries.

As for 10646, ISO says, as always, everything. 10646 could be used with
UCS2, UCS4 or UTF1 with arbitraly any subsetting.

I don't know precisely what X/Open is doing but they should be using
UTF2.

Plan9 support only 16bit of 10646 with UTF2, though its handlig of the
32-th bit is somewhat different from what X/Open says.

Microsoft also support 16 bit only. It use bare 16 bit representation (maybe
with little endean in octet serialized files). It does not support all Han
characters (JIS only in Japan). They also support existing MS Kanji code
(so called shift JIS) in Japan.

How is apple? I won't be so much surprised if they use big endean in files.

So, what is "the general stream of the development"?

> >> He  proposed  an  extension  to  the
> >> existing  UCS  code  system consisting of 5 additional bits which will
> >> enable  the deficiency of the UCS coding system to be overcomed.
> 
> >> In  the  discussion  the  problem  of handling of
> >> bidirectional text was also identified.

> What is the problem with the text of the BOF?

The problem (full bidirectionality support can not be done with finite
state and, thus, not plain text) is identified by me and then discussed,
I think. But, if you think there is other bidirectionality problems
identified, it's OK.

> It is said "taking in consideration" that does not imply that UTF-2 is
> the ultimate solution.

OK.

> >> (a  sort  of
> >> guidelines   for  services dealing with multilinguality such as NIR 
> >> service based on usage of plein text),
> 
> >What do you mean? Aren't you assuming MIME-like labeling of character
> >sets each containing only a limited number of characters?
> 
> I do not mean anything. We just pointed out the most concerned services
> based on plein text and that is all about. We did not suggest any solution!

So, could you explain how is the most concerned services? I just want to
know, because I don't think we can seek the solution for all the
multilingual problem here now.

As for the character encoding scheme, I think we should make the scheme
with which all the languages in the world can be encoded/decoded even
if dozens of languages are mixed freely within single text.

But, should we address the further multilinguality issues? For example,
specifying 8859-1 does not mean one can accept both English, French and
German. Should we develop protocols to treat that type of multilinguality
now?

I think we shouldn't until we have an agreed ultimate encoding scheme.

> >Yes. Shouldn't we also address input issues for outgoing characters from
> >ASCII environment?
> 
> This is contained in it implicitly.

I think the input issue will be much more important within 3, 5 or maybe
10 years when terminals without full graphic capability disappears with
mainframes (which does not mean we don't have to address the output issue
now). And, then, the commonly available keyboard will still be ASCII. So,
could you make it explicite?

> >> -a  proposal  for  extending  the  mandatory  issues  which have to be
> >> covered in the RFC standardization process to  include  character  set
> >> consideration/support.
> 
> >Really? Hmmm. Good luck.
> 
> O.K. Let us try. Maybe the problem is so difficult and we will not come to
> an agreement but what is your proposal?

I don't think the problem is technically difficult. I have no proposal.
So, do it, if someone want to do.

But, as I want to have a single, general purpose solution so that each
protocol does not have to worry about character set issues, I don't
think I have to do it by myself.

							Masataka Ohta

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Monday, 23 August 1993 22:52:24 UTC