Re: Internationalized CLASS attributes

On Oct 17,  8:37pm, Keld J|rn Simonsen wrote:

> 1. some characters may only have a lower case form, so converting
> to upper case is not posssible. Example: German <ss>, Greenlandic <kra>.

Yes. As the quote says, there are "many more lowercase forms than there are
upper". Hence the recommendation that if case folding is performed, the
conversion is to lower case.

> 2. a number of lower case forms exists where there is only one upper
> case form, example Greek sigma, where there is a terminal sigma.
>
> In the first instance I can see a reason to normalize on lower-case,
> but in the second case I see problems in chosing which lower case
> to normalize on.

Yes. There is also a problem with accented characters, as the
typewriter-inflicted convention in some languages is to omit accents on
upper-case letters. I certainly did not mean to suggest that folding to
lower case was problem free; rather that there are more problems when
folding to upper case.

> I would rather that you did not normalize, but made a case-independent,
> or case-and-accent-independent comparison,

Sorry, could you eplain how a case-independent comparison differs from case
folding (or normalization) ?

> for example using the functions and tables of the forthcoming ISO
> sorting standard ISO/IEC 14651.

Thanks for the reference. Are these tables available online?


-- 
Chris Lilley, W3C                          [ http://www.w3.org/ ]
Graphics and Fonts Guy            The World Wide Web Consortium
http://www.w3.org/people/chris/              INRIA,  Projet W3C
chris@w3.org                       2004 Rt des Lucioles / BP 93
+33 93 65 79 87            06902 Sophia Antipolis Cedex, France

Received on Thursday, 17 October 1996 15:39:27 UTC