- From: Keld J|rn Simonsen <keld@dkuug.dk>
- Date: Thu, 24 Oct 1996 17:32:29 +0200
- To: Martin J Duerst <mduerst@ifi.unizh.ch>
- Cc: rosenne@NetVision.net.il, www-international@w3.org
Martin J Duerst writes: > So while I agree that bad display can be inconvenient (but in > some cases, if it's the only way given limited resources, it > might be considered better than nothing) or even offensive, > this has nothing to do with the decision whether to internally > store things precomposed or decomposed. I agree that the encoding of one character in one or two or three or any other number of bytes is not very important as long as it is unambigeous. Well, it matters for the size of the stored or transmitted data, but that is not the subject of this discussion. But we are not talking about coding of one character, but decomposing an entity into two or more characters. This means that the entity for example the Ø letter can be decomposed into two logical entities, and that is not the case for Ø, which is a separate letter. You cannot split Ø into any components. Anyway there is not in 10646 any definition on how to split Ø into smaller components. And there is a number of problems in doing it, such as that there is *two* combining accents that may be valid, the short and long combining solidus overlay 0337 and 0338. Should a small ø be decompsed using the short or long char, and what about the capital Ø? What about then converting between upper and lowercase, for these two combining characters? Does converting from small to capital imply converting short to long? And will that hold for also the "decomposed" forms of L and H etc with solidus? But coding Ø or accented letters as "decomposed" combining sequences is trying to introduce more than one way of encoding the information in 10646, and specifying equivalent encoding with combining characters has been rejected by SC2/WG2. This would also be in conflict with the definition of a coded character set. Keld
Received on Thursday, 24 October 1996 11:34:09 UTC