Re: Internationalized CLASS attributes
Keld Simonsen wrote:
>Martin J Duerst writes:
>> Keld Simonsen wrote:
>> >Martin J Duerst writes:
>> Does it say explicitly that an application is forbidden to
>> treat the two representations as equivalent, or to normalize
>> to one or the other? Or does it say that a system is forbidden
>> (on level 3) to use the sequence of LATIN CAPITAL LETTER A and
>> COMBINING GRAVE ACCENT? If yes, can you tell me in which chapter
>> (please no page numbers, I only have a Japanese translation)
>> it says so?
>No 10646 does not forbid that. But it is not defined in 10646, and you
>can do it in an application. Byt why not just do as the standard
>prescribes to encode a character, when you mean that character.
>Then you are following international standards.
Well, somebody may encode A-with-GRAVE, because (s)he sees it as
a single character, and it appears as such on the keyboard.
And somebody else may encode A followed by GRAVE, because
the GRAVE is on a separate key, and e.g. as a tone can go on
any vowel (of course, what the user enters and what the system
does may well be two different things). And strictly by ISO 10646,
these might be two different things. ISO 10646 does no prescribe
that an A followed by a combining GRAVE is illegal, or should not
be used, just because A-with-GRAVE exists as a separate codepoint.
And whether a user means "that character" or is more familliar to
think about it as two different things is important for the local
user interface, but as both of these things appear (or should appear)
in the same way on the screen, even if ISO 10646 does not specify
any equivalence, it makes sense to specify equivalences on the
>Well, if you have a 10646 character there is only one way
>to encode it. There is no decomposition in 10646.
A user does not have or see ISO 10646 characters. A user
sees and deals with things on the screen and on paper.
ISO 10646 characters are abstract entities, and we have
to make sure, where possible, that the application takes
provisions to reconcile these abstract entities with the
expectations of the user if the expectations of the user
>> For the above example, immagine a tonal language such as
>> Chinese. For many applications, it may be more convenient
>> to be able to detach tone accents by removing characters
>> than to do conversions from one codepoint to another.
>Could it not just as conveniently be handled with the ordinary
>characters of 10646?
Combining characters are 10646 characters too, and indispensable
for some languages. Calling everything else "ordinary" is not
very friendly to these languages.
Also, in some cases, no precombined characters are available,
so "just handling it with 'ordinary'" characters is not possible.