- From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
- Date: Wed, 21 Jul 1993 15:59:18 +0900 (JST)
- To: ietf-charsets@INNOSOFT.COM
Before proceeding to the detailed discussion, I would like to clarify our goals and current issues. First of all, I made a summary of the discussion of ucs-bof in the last IETF at Amsterdam. Any comments or corrections? 1) Many existing protocols are evaluated how they adopt to the extended character sets. 2) Whether we should extend all the protocols so that they can negotiate or announce the character set used or we should provide a single universal encoding of text is the first issue. No one said the former is better and the discussion continued on how to implement the latter only. 3) Assuming ISO 10646, whether we should use 16bit byte or UTF style encoding is discussed. With brief explanation on the issue that "16bit byte" is incompatible with the current ASCII files and ASCII based protocols and that "16bit byte" is an obstacle to 32bitness though ISO 10646 is now being extended beyond 16 bit no one said "16bit byte" anymore (though some might still silently think "16bit byte" is the way to go). 4) If we are to have a single universal text encoding, the encoding should be good enough in every respect. Several requirements for the encoding was presented by me. Plain Text Processing We should focus on the processing of plain text. Universality The encoding must be able to restore the original content of a encoded plain text without any negotiation nor profiling. This requirement is already stated in 2). Causality Because of the law of causality, decoding process can not depend on a not-yet-happened event. Thus, for an interactive processing, as immediate output is required, a shape of a character can not depend on the possibly-not-yet-typed next character. Finitestateness The decoding process might be controlled by a stateful automaton. But, as long as the plain text processing concerns, the state transition should be represented with a finite state automaton. Finite Resynchronizability Even if the state of the finite state automaton become unknown, the resynchronization of the state should be possible by reading fixed finite number of bytes. Equality Equality of two text should be defined unambiguously, of course. ASCII compatibility The encoding should be ASCII compatible so that no conversion of files nor no modification to protocols necessary. At the bof, there was no objection to any of the requirements. 5) It is agreed that for major European characters, ISO 10646 level 1 with UTF2 satisfies all the above requirements but ISO 10646 does not satisfy any (save ASCII compatibility) of the requirement if several other languages are taken into consideration. 6) A 21 bit encoding, ICODE, and its external representation, IUTF, was presented by me as an extension to ISO 10646 and UTF2, which satisfies all the requirement in 4) and also supports bidirectionality. 7) During the discussion on ICODE, it was pointed out that ICODE do use the non private code point of UCS4. So, ICODE was slightly modified to also have an explicite representation as UCS4 (not necessarily equal to ICODE) which use the private use zone for the extension, so that ICODE is now completely compatible with ISO 10646. So, characters in ICODE now have three representations: UCS4 ICODE IUTF Even if the private area is moved by the ISO in the future (as suggested by John), it is only necessary to change the mapping from ICODE to UCS4 which eventually does not affect any ICODE program because no one will use UCS4 representation. 8) Someone (Harald, I think) said 32bit universal encoding is the ultimate goal and showed several pathes to the goal. 9) It was agreed that the ultimate goal shouldn't break ISO 10646. 10) It was agreed that the number of code conversion should be minimized. 11) Borka said it is also important to have conversion method of characters so that many Euro characters are visible on, say, ASCII only characters. There was no objection. Masataka Ohta PS Those who have not attended the bof might think that the summary contains too much presentation and opinion of me. But, it actually took large amount of the time of the entire BOF. --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Wednesday, 21 July 1993 00:03:03 UTC