- From: Keld J|rn Simonsen <keld@dkuug.dk>
- Date: Mon, 21 Oct 1996 22:44:58 +0200
- To: Jonathan Rosenne <rosenne@NetVision.net.il>
- Cc: WWW-International List <www-international@w3.org>
Jonathan Rosenne writes: > Keld J|rn Simonsen wrote: > > I believe that since we are talking international environments, and the > > HTML language, we are using ISO/IEC 10646 as prescribed in the > > HTML specifications, and then coding characters according to that. > > The only way defined in 10646 to encode LATIN SMALL LETTER A WITH ACUTE= > is > > is as 00E1, other ways of defining this character is not defined in > > 10646. > > Keld, this is misleading albeit correct in a very narrow sense, the > letter followed by the combining character having been defined as a > "composite sequence" and not as a "character". However, there are > several counter-indications in the text. Could you pint me to some? I don't read clause 15 below to normatively specifying other ways of encoding A-GRAVE than 00E1. I do not think that what I have said is misleading on what is normatively specified in 10646, and what is not. I invite you to tell me where the encoding of A-GRAVE is normatively defined with combining characters in the 10646 standard. > > I refer you, for example, to clause 15, and would like to note that > nowhere was it specified that HTML will be restricted to implementation > level 1. On the contrary, it is obvious that the full implementation, > level 3, is specified. I agree that HTML uses full canonical 10646. I do not see that the clause 15 below defines unambigeous coding of information using combining charactrs. It clearly allows it, as I also have said befor, but in a unspecified, impelemtation defined way. Al I am saying is that we should avoid the unspecified, impementation defined encoding, and stick to what is well-defined in the standard. That also allows us to use other specifications, that build on the well-defined encoding of 10646. > "15 Implementation levels > > "ISO/IEC 10646 specifies three levels of implementation. Combining > characters are described in 23 and listed in annex B. > > "15.1 Implementation level 1 > > "When implementation level 1 is used, a CC-data-element shall not > contain coded representations of combining characters (see clause B.1) > nor of characters from HANGUL JAMO block (see clause 24)." > > ... > > "15.3 Implementation level 3 > > "When implementation level 3 is used, a CC-data-element may contain > coded representations of any characters." > > See also clause 23.1: "for example, coded representations of LATIN SMALL > LETTER A followed by COMBINING TILDE represent a composite sequence for > Latin "=E3" [LATIN SMALL LETTER A WITH TILDE]". > > See also the note at the end of clause 23.3: "NOTE - Where combining > characters are used for the generation of composite sequences in > implementation level 3, this facility may be used to provide an > alternative coded representation of text. For example, in implementation > level 3 the French word "l=E0" may be represented by the characters LATIN > SMALL LETTER L followed by LATIN SMALL LETTER A WITH GRAVE, or may be > represented by the characters LATIN SMALL LETTER L followed by LATIN > SMALL LETTER A followed by COMBINING GRAVE ACCENT". > > I don=92t see the value of the 10646 implementation levels in the > international milieu. They are contrary to the purpose of achieving a > truly international character code. If people want an inter-European or > North Atlantic character code they are welcome, and it is a good thing > they base it on 10646, as long as they do realize that it is just a > regional solution. I agree that level 3 should be allowed in international encoding. But I also advise that if people all over the world would like to encode an A-GRAVE than they should use the encoding defined in 10646 for that character, namely 00E1. > The question of composed vs. composite was part of the Unicode - 10646 > compromise that made the approval of 10646 possible. It happens from > time to time in standards work that the authorized committees make > decisions that are contrary to my opinions. The only way to make > progress and to get on with the work is to accept these decisions and > live with them.=20 Yes, SC2/WG2 has decided that equivalence tables should not be standardized, as they are culturaly offensive. So why don't we just just live with this decision and avoid them? > Once the standard is published, that's it, people out there are > implementing it the way they understand it - and most of them only know > and care about the actual standard document, and all the working papers, > minutes, submissions, explanations etc. that are not incorporated in the > document are from that point on mute and irrelevant. I agree that we should stick to the standards as they are, and not incorporate a number of things that we would like to see, but were not defined in the standards. Keld
Received on Monday, 21 October 1996 16:45:17 UTC