Re: Internationalized CLASS attributes from Keld J|rn Simonsen on 1996-10-24 (www-international@w3.org from October to December 1996)

From: Keld J|rn Simonsen <keld@dkuug.dk>
Date: Thu, 24 Oct 1996 17:32:29 +0200
To: Martin J Duerst <mduerst@ifi.unizh.ch>
Cc: rosenne@NetVision.net.il, www-international@w3.org
Message-Id: <199610241532.RAA23897@dkuug.dk>

Martin J Duerst writes:

> So while I agree that bad display can be inconvenient (but in
> some cases, if it's the only way given limited resources, it
> might be considered better than nothing) or even offensive,
> this has nothing to do with the decision whether to internally
> store things precomposed or decomposed.

I agree that the encoding of one character in one or two or three
or any other number of bytes is not very important as long as it
is unambigeous. Well, it matters for the size of the stored or
transmitted data, but that is not the subject of this discussion.

But we are not talking about coding of one character, but decomposing
an entity into two or more characters.  This means that the entity
for example the Ø letter can be decomposed into two logical entities,
and that is not the case for Ø, which is a separate letter. 
You cannot split Ø into any components.

Anyway there is not in 10646 any definition on how to split
Ø into smaller components. And there is a number of problems in
doing it, such as that there is *two* combining accents that may
be valid, the short and long combining solidus overlay 0337 and 0338.
Should a small ø be decompsed using the short or long char,
and what about the capital Ø? What about then converting between
upper and lowercase, for these two combining characters?
Does converting from small to capital imply converting short to long?
And will that hold for also the "decomposed" forms of L and H
etc with solidus?

But coding Ø or accented letters as "decomposed" combining sequences
is trying to introduce more than one way of encoding the information
in 10646, and specifying equivalent encoding with combining characters
has been rejected by SC2/WG2. This would also be in conflict with
the definition of a coded character set.

Keld

Received on Thursday, 24 October 1996 11:34:09 UTC