[Prev][Next][Index][Thread]

Re: Internationalized CLASS attributes



Martin J Duerst writes:

> Keld Simonsen wrote:
> 
> >Martin J Duerst writes:
> 
> Does it say explicitly that an application is forbidden to
> treat the two representations as equivalent, or to normalize
> to one or the other? Or does it say that a system is forbidden
> (on level 3) to use the sequence of LATIN CAPITAL LETTER A and
> COMBINING GRAVE ACCENT? If yes, can you tell me in which chapter
> (please no page numbers, I only have a Japanese translation)
> it says so?

No 10646 does not forbid that.  But it is not defined in 10646, and you
can do it in an application. Byt why not just do as the standard
prescribes to encode a character, when you mean that character.
Then you are following international standards.

By the way, the text of 10646 with the latest addenda is available
in MS word format in ftp://dkuug.dk/JTC1/SC2/WG2/docs/N1396.doc
- it is about 1 Mb.

> >> Such an interpretation may not conflict with ISO 10646, but it clearly
> >> does not help any user. ISO 10646 also does not prohibit to collapse
> >> these two representations for the benefit of the user.
> >
> >I would rather say that for the benefit of the user you
> >should only encode a character in one way, and that is the encoding
> >of 10646. You should not engage in artificial decomposition
> >of characters, that only complicates things.
> 
> I agree that a system should only encode characters in one way.
> But just the way you say it suggests that there are more than
> one ways. Also, where in one system or language, using precomposed
> characters is the natural way to do things, in another system
> using decomposition may be the natural way to do things.

Well, if you have a 10646 character there is only one way
to encode it. There is no decomposition in 10646.

> For the above example, immagine a tonal language such as
> Chinese. For many applications, it may be more convenient
> to be able to detach tone accents by removing characters
> than to do conversions from one codepoint to another.

Could it not just as conveniently be handled with the ordinary
characters of 10646?

> Also, for an application that has to use combining characters
> for languages and special applications that don't have all their
> combinations precomposed in ISO 10646, it may be much more
> straightforward to have everything composing, and no precomposed
> codepoints. Software has to deal with composition anyway if
> it deals with pointed Arabic and Hebrew, or with Indic languages.
> 
> Also, actual implementation experience and recent discussions
> show that the effort to deal with composition is mostly
> overestimated.

Thats good to hear

Keld


Follow-Ups: