[Prev][Next][Index][Thread]

Re: Internationalized CLASS attributes



Chris Lilley writes:

> Another relevant quote from the Unicode standard, on the subject of case
> conversion:
> 
> "Because there are many more lowercase forms than there are upper, it is
> recommended that the lowercase be used for normalisation rather than the
> uppercase, such as when strings are case-folded for loose comparison or
> indexing."

I see two things here:

1. some characters may only have a lower case form, so converting
to upper case is not posssible. Example: German <ss>, Greenlandic <kra>.

2. a number of lower case forms exists where there is only one upper
case form, example Greek sigma, where there is a terminal sigma.

In the first instance I can see a reason to normalize on lower-case,
but in the second case I see problems in chosing which lower case
to normalize on.

I would rather that you did not normalize, but made a case-independent,
or case-and-accent-independent comparison, for example using
the functions and tables of the forthcoming ISO sorting standard
ISO/IEC 14651.

Keld


Follow-Ups: