- From: Chris Lilley <Chris.Lilley@sophia.inria.fr>
- Date: Thu, 17 Oct 1996 21:39:16 +0200 (DST)
- To: keld@dkuug.dk (Keld J|rn Simonsen), Chris Lilley <Chris.Lilley@sophia.inria.fr>, Jonathan Rosenne <rosenne@NetVision.net.il>, WWW-International List <www-international@w3.org>
On Oct 17, 8:37pm, Keld J|rn Simonsen wrote: > 1. some characters may only have a lower case form, so converting > to upper case is not posssible. Example: German <ss>, Greenlandic <kra>. Yes. As the quote says, there are "many more lowercase forms than there are upper". Hence the recommendation that if case folding is performed, the conversion is to lower case. > 2. a number of lower case forms exists where there is only one upper > case form, example Greek sigma, where there is a terminal sigma. > > In the first instance I can see a reason to normalize on lower-case, > but in the second case I see problems in chosing which lower case > to normalize on. Yes. There is also a problem with accented characters, as the typewriter-inflicted convention in some languages is to omit accents on upper-case letters. I certainly did not mean to suggest that folding to lower case was problem free; rather that there are more problems when folding to upper case. > I would rather that you did not normalize, but made a case-independent, > or case-and-accent-independent comparison, Sorry, could you eplain how a case-independent comparison differs from case folding (or normalization) ? > for example using the functions and tables of the forthcoming ISO > sorting standard ISO/IEC 14651. Thanks for the reference. Are these tables available online? -- Chris Lilley, W3C [ http://www.w3.org/ ] Graphics and Fonts Guy The World Wide Web Consortium http://www.w3.org/people/chris/ INRIA, Projet W3C chris@w3.org 2004 Rt des Lucioles / BP 93 +33 93 65 79 87 06902 Sophia Antipolis Cedex, France
Received on Thursday, 17 October 1996 15:39:27 UTC