- From: Jonathan Rosenne <rosenne@NetVision.net.il>
- Date: Fri, 18 Oct 1996 22:38:01 +0300
- To: Keld J|rn Simonsen <keld@dkuug.dk>
- CC: WWW-International List <www-international@w3.org>
Keld J|rn Simonsen wrote: > I believe that since we are talking international environments, and the > HTML language, we are using ISO/IEC 10646 as prescribed in the > HTML specifications, and then coding characters according to that. > The only way defined in 10646 to encode LATIN SMALL LETTER A WITH ACUTE is > is as 00E1, other ways of defining this character is not defined in > 10646. Keld, this is misleading albeit correct in a very narrow sense, the letter followed by the combining character having been defined as a "composite sequence" and not as a "character". However, there are several counter-indications in the text. I refer you, for example, to clause 15, and would like to note that nowhere was it specified that HTML will be restricted to implementation level 1. On the contrary, it is obvious that the full implementation, level 3, is specified. "15 Implementation levels "ISO/IEC 10646 specifies three levels of implementation. Combining characters are described in 23 and listed in annex B. "15.1 Implementation level 1 "When implementation level 1 is used, a CC-data-element shall not contain coded representations of combining characters (see clause B.1) nor of characters from HANGUL JAMO block (see clause 24)." ... "15.3 Implementation level 3 "When implementation level 3 is used, a CC-data-element may contain coded representations of any characters." See also clause 23.1: "for example, coded representations of LATIN SMALL LETTER A followed by COMBINING TILDE represent a composite sequence for Latin "ã" [LATIN SMALL LETTER A WITH TILDE]". See also the note at the end of clause 23.3: "NOTE - Where combining characters are used for the generation of composite sequences in implementation level 3, this facility may be used to provide an alternative coded representation of text. For example, in implementation level 3 the French word "là" may be represented by the characters LATIN SMALL LETTER L followed by LATIN SMALL LETTER A WITH GRAVE, or may be represented by the characters LATIN SMALL LETTER L followed by LATIN SMALL LETTER A followed by COMBINING GRAVE ACCENT". I don’t see the value of the 10646 implementation levels in the international milieu. They are contrary to the purpose of achieving a truly international character code. If people want an inter-European or North Atlantic character code they are welcome, and it is a good thing they base it on 10646, as long as they do realize that it is just a regional solution. The question of composed vs. composite was part of the Unicode - 10646 compromise that made the approval of 10646 possible. It happens from time to time in standards work that the authorized committees make decisions that are contrary to my opinions. The only way to make progress and to get on with the work is to accept these decisions and live with them. Once the standard is published, that's it, people out there are implementing it the way they understand it - and most of them only know and care about the actual standard document, and all the working papers, minutes, submissions, explanations etc. that are not incorporated in the document are from that point on mute and irrelevant. Venlig hilsen, Jonathan Rosenne
Received on Friday, 18 October 1996 16:40:55 UTC