[Prev][Next][Index][Thread]

Re: Internationalized CLASS attributes



Keld J|rn Simonsen wrote:
> I believe that since we are talking international environments, and the
> HTML language, we are using ISO/IEC 10646 as prescribed in the
> HTML specifications, and then coding characters according to that.
> The only way defined in 10646 to encode LATIN SMALL LETTER A WITH ACUTE is
> is as 00E1, other ways of defining this character is not defined in
> 10646.

Keld, this is misleading albeit correct in a very narrow sense, the
letter followed by the combining character having been defined as a
"composite sequence" and not as a "character". However, there are
several counter-indications in the text.

I refer you, for example, to clause 15, and would like to note that
nowhere was it specified that HTML will be restricted to implementation
level 1. On the contrary, it is obvious that the full implementation,
level 3, is specified.

"15  Implementation levels

"ISO/IEC 10646 specifies three levels of implementation. Combining
characters are described in 23 and listed in annex B.

"15.1  Implementation level 1

"When implementation level 1 is used, a CC-data-element shall not
contain coded representations of combining characters (see clause B.1)
nor of characters from HANGUL JAMO block (see clause 24)."

...

"15.3  Implementation level 3

"When implementation level 3 is used, a CC-data-element may contain
coded representations of any characters."

See also clause 23.1: "for example, coded representations of LATIN SMALL
LETTER A followed by COMBINING TILDE represent a composite sequence for
Latin "" [LATIN SMALL LETTER A WITH TILDE]".

See also the note at the end of clause 23.3: "NOTE - Where combining
characters are used for the generation of composite sequences in
implementation level 3, this facility may be used to provide an
alternative coded representation of text. For example, in implementation
level 3 the French word "l" may be represented by the characters LATIN
SMALL LETTER L followed by LATIN SMALL LETTER A WITH GRAVE, or may be
represented by the characters LATIN SMALL LETTER L followed by LATIN
SMALL LETTER A followed by COMBINING GRAVE ACCENT".

I dont see the value of the 10646 implementation levels in the
international milieu. They are contrary to the purpose of achieving a
truly international character code. If people want an inter-European or
North Atlantic character code they are welcome, and it is a good thing
they base it on 10646, as long as they do realize that it is just a
regional solution.

The question of composed vs. composite was part of the Unicode - 10646
compromise that made the approval of 10646 possible. It happens from
time to time in standards work that the authorized committees make
decisions that are contrary to my opinions. The only way to make
progress and to get on with the work is to accept these decisions and
live with them. 

Once the standard is published, that's it, people out there are
implementing it the way they understand it - and most of them only know
and care about the actual standard document, and all the working papers,
minutes, submissions, explanations etc. that are not incorporated in the
document are from that point on mute and irrelevant.

Venlig hilsen,

Jonathan Rosenne


References: