[Prev][Next][Index][Thread]

Re: Internationalized CLASS attributes



Jonathan Rosenne writes:

> Keld J|rn Simonsen wrote:
> > I believe that since we are talking international environments, and the
> > HTML language, we are using ISO/IEC 10646 as prescribed in the
> > HTML specifications, and then coding characters according to that.
> > The only way defined in 10646 to encode LATIN SMALL LETTER A WITH ACUTE=
>  is
> > is as 00E1, other ways of defining this character is not defined in
> > 10646.
> 
> Keld, this is misleading albeit correct in a very narrow sense, the
> letter followed by the combining character having been defined as a
> "composite sequence" and not as a "character". However, there are
> several counter-indications in the text.

Could you pint me to some? I don't read clause 15 below to 
normatively specifying other ways of encoding A-GRAVE than 00E1.
I do not think that what I have said is misleading on what is
normatively specified in 10646, and what is not. I invite you to
tell me where the encoding of A-GRAVE is normatively defined with
combining characters in the 10646 standard.
> 
> I refer you, for example, to clause 15, and would like to note that
> nowhere was it specified that HTML will be restricted to implementation
> level 1. On the contrary, it is obvious that the full implementation,
> level 3, is specified.

I agree that HTML uses full canonical 10646.

I do not see that the clause 15 below defines unambigeous coding of
information using combining charactrs. It clearly allows it, as I also
have said befor, but in a unspecified, impelemtation defined way.
Al I am saying is that we should avoid the unspecified, impementation
defined encoding, and stick to what is well-defined in the standard.

That also allows us to use other specifications, that build on the
well-defined encoding of 10646.

> "15  Implementation levels
> 
> "ISO/IEC 10646 specifies three levels of implementation. Combining
> characters are described in 23 and listed in annex B.
> 
> "15.1  Implementation level 1
> 
> "When implementation level 1 is used, a CC-data-element shall not
> contain coded representations of combining characters (see clause B.1)
> nor of characters from HANGUL JAMO block (see clause 24)."
> 
> ...
> 
> "15.3  Implementation level 3
> 
> "When implementation level 3 is used, a CC-data-element may contain
> coded representations of any characters."
> 
> See also clause 23.1: "for example, coded representations of LATIN SMALL
> LETTER A followed by COMBINING TILDE represent a composite sequence for
> Latin "=E3" [LATIN SMALL LETTER A WITH TILDE]".
> 
> See also the note at the end of clause 23.3: "NOTE - Where combining
> characters are used for the generation of composite sequences in
> implementation level 3, this facility may be used to provide an
> alternative coded representation of text. For example, in implementation
> level 3 the French word "l=E0" may be represented by the characters LATIN
> SMALL LETTER L followed by LATIN SMALL LETTER A WITH GRAVE, or may be
> represented by the characters LATIN SMALL LETTER L followed by LATIN
> SMALL LETTER A followed by COMBINING GRAVE ACCENT".
> 
> I don=92t see the value of the 10646 implementation levels in the
> international milieu. They are contrary to the purpose of achieving a
> truly international character code. If people want an inter-European or
> North Atlantic character code they are welcome, and it is a good thing
> they base it on 10646, as long as they do realize that it is just a
> regional solution.

I agree that level 3 should be allowed in international encoding.
But I also advise that if people all over the world would like to
encode an A-GRAVE than they should use the encoding defined
in 10646 for that character, namely 00E1.

> The question of composed vs. composite was part of the Unicode - 10646
> compromise that made the approval of 10646 possible. It happens from
> time to time in standards work that the authorized committees make
> decisions that are contrary to my opinions. The only way to make
> progress and to get on with the work is to accept these decisions and
> live with them.=20

Yes, SC2/WG2 has decided that equivalence tables should not be
standardized, as they are culturaly offensive. So why don't we just
just live with this decision and avoid them?

> Once the standard is published, that's it, people out there are
> implementing it the way they understand it - and most of them only know
> and care about the actual standard document, and all the working papers,
> minutes, submissions, explanations etc. that are not incorporated in the
> document are from that point on mute and irrelevant.

I agree that we should stick to the standards as they are, and not
incorporate a number of things that we would like to see,
but were not defined in the standards.

Keld


Follow-Ups: