Re: Proposals for 10646/Unicode in MIME from Borka Jerman-Blazic on 1993-12-21 (ietf-charsets@w3.org from October to December 1993)

From: Borka Jerman-Blazic <jerman-blazic@ijs.si>
Date: Tue, 21 Dec 1993 08:22:36 +0100
To: ietf-charsets <ietf-charsets@INNOSOFT.COM>
Cc: dcrocker <dcrocker@mordor.stanford.edu>, David_Goldsmith <David_Goldsmith@taligent.com>
Message-id: <328*/S=jerman-blazic/O=ijs/PRMD=ac/ADMD=mail/C=si/@MHS>
From M.Ohta san message:

>> I note here that Masataka's proposal for ISO-2022-JP-2 demonstrates what
>> we've been arguing all along: it is not enough to just have a character
>> encoding.

Yes!

>Recently I avoid to use the word "character" as much as possible and
>use the phrase "text encoding", because the concept of "character"
>beyond ASCII can not be well defined. Various units of text encoding 
>are necessary for different purposes.

As long as you speak and write about ISO 2022 and UCS you have to speak
about characters and character sets  to avoid the mess! What is character
is well defined in these documents and that is why you have to keep the
meaning of the terminology clear. 

>Thus, I think the names such as MIME charset and ietf-charsets ML
>no good.

This is something else. Maybe this list should be called internationalisation
of the services or something similar. We know that character sets are main
issue in the internationalisation but the only one!

> >There also needs to be some form of markup to distinguish
> >different usages of the same character encoding.  ISO-2022-JP-2 uses
> >escape sequences to do markup, whereas a UNICODE version of text/enriched
> >would use <...> tags.

>ISO-2022-JP-2 does not do any markup. It is for plain text.

>It is finite state. It has no nesting.

Yes, you are right.

>I don't think anything with nested structure is plain text.

>It is and its successors will be as stateless as practically possible
>with ISO 2022.

ISO 2022 has no and will not have successors. What you have are just derivative
which are not legal if ISO 2022 is considered or followed (i.e the use of G0).

>That is, at the beginning of a line, the state can be assumed to be unique.

Not always!

>> The main difference I can see is that ISO-2022-JP-2
>> requires the use of markup, even when the whole message is in the same
>> language, but UNICODE can get away without markup for 99% of messages,

Yes, of course!

>It is a meaningless difference.

Not at all!

>> letting local conventions set the default language.

Yes !

>That is one of a very important difference.

>Unlike UNICODE, ISO-2022-JP-2 is intended to be used in internationalized
>environment. It needs no local conventions. BTW, MIME charsets also, can
>not depend on local conventions.

UNICODE was developed for internationalised environment and is implemented
in internasionalised products!

> >I still fail to see why Masataka objects to UNICODE since his own proposal has
> >to jump through the same markup hoops. The only advantage of ISO-2022-JP-2
> >that I can see is that it will work on existing terminals without special
> >software in some communities.

More or less,  ISO 2022 is important for some OSI applications (X.400, X.500)
where GeneralText is used as a syntax.

>Then, you can see nothing.

Could you be please more polite in your mailings.

>ISO-2022-JP-2 is produced from long and extensive
>localization/internationalization experiences in Japanese computer community
>with ISO-2022-JP, EUC, SJIS and such.

>First of all, ISO-2022-JP-2 can interoperate with ASCII.

>Next, it is 7 bit.

This is just japanase derivative which is using ISO 2022 extension technique
but not the rules defined for use of these technique, so from the point view
of ISO 2022 it is not legal application!

ISO 2022 is equaly applicable to 8-bit enviroment. Very well known expert
from the character sets world from very well known manufacturer haouse said
that this derivative of ISO 2022 is a sort of cheating!

>Thus, it can interoperate with any ASCII compatible text encoding such
>as EUC (both UJIS and EUC-KR) and SJIS.

>More importantly, it can interoperate with the future ultimate ASCII
>compatible 8 bit encoding. Of course, UNICODE is NOT the future.

UNICODE is NOT future, what is then future?

>We do know that having two or more uninteroperable encodings such
>as EUS and SJIS or ASCII and 16bit-UNICODE is the real pain.

Why?

>> A specious argument at best, since the rest
>> of the world does need special software to view ISO-2022-JP-2 anyway.

Exactly!

>ISO-2022-JP-2 is, and ISO-2022-INT-1 will be, designed to aid those
>who immediately need localization.

>I don't think it be a long term solution.

Here you are right, then what is the future or long term solution?

>Both ISO 2022 and ISO 10646/UNICODE has a unified syntax to mix
>multilingual characters in the world. ISO 2022 is much better for
>us to be able to separate C/J/K characters.

You can not speak alltogether about UNICODE and ISO 2022 because they are
not the same type of standards. ISO 2022 allows exchanging of text coded
in UNICODE by use of the registred ESCAPE SEQUENCE for UNICODE/UCS level 1.
The only correct statement is that ISO 2022 has possibilities to 
separate C/J/K character set codes as these codes are registred and
they have their own ESCAPE SEQUENCES as UCS has. Where is the difference?
With ISO 2022 technique I can exchange C/J/K as well as text coded in UNICODE!

>On the other hand, both ISO 2022 and ISO 10646/UNICODE lacks a unified
>semantics to mix multilingual characters in the world. ISO 10646/UNICODE
>inherits the policy of ISO 2022 to treat characters in different languages
>differently. Thus, it is impossible to write a unified text processing
>library or application of meaningfully rich functionality.

This is not true!

>Thus, for the time being, our solution must be 7 bit ISO 2022.

To whom this "MUST BE 7 bit ISO 2022" apply?

>As a long term solution, I have designed ICODE/IUTF, which has, besides
>ASCII compatibility, several useful semantical properties for, as far
>as I know, all the characters in the world. With a large enough encoding
>space (though not impractically large), the real, semantical, unification
>is possible.

ICODE was rejected by the IETF BOF on UCS in Amsterdam! You can read the
minutes in the Proceedings and find out why.

> >UNICODE has the advantage that if a message gets corrupted and the markup
> >is lost, there is still a reasonable character that can be displayed, which
> >is close enough not to cause the sky to fall in on the reader.  Such corruption
> >could easily happen when a message is quoted.  
>>What happens with ISO-2022-JP-2?

>Misquoting is the issue which MUST be solved by faulty MTAs and other
>faulty transports. Providing workarounds will only result in the delay
>of the real solution.

>Instead, the real state corruption problem is caused in an interactive
>environment where individual programs output their own text streams
>simultaneously.

The real problems are in the interactive environment i.e in the NIR services.
All the rest (e-mail, or ftp) is easy compared to that!

>With ISO-2022-JP-2, unlike text/enriched, the state is resumed at the
>beginning of the next line.

>> People have tried time and again to add markup to UNICODE to satisfy Masataka
>> (e.g. language tags), but it just doesn't seem to satisfy him. *sigh*

Correct.

>I have *ABSOLUTELY* *NO* interest in text/enriched from the beginning.

You are interested just in one thing, how to make difference of C/J/K
character sets in using UNICODE. Such restricted interest can not lead
to an international solution. All attempts in that direction will result in 
a local solution to a restricted region as is  2022 JP.

>I and most of the people in the world want to process our natural
>languages as plain text in internationalized environment.

O.K.

>We already have a lot of experience to use our languages as plain text.

>You can't force us give up plain text.

No one can do it!

Regards,

Borka Jerman-Blazic

		

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Tuesday, 21 December 1993 00:30:08 UTC