- From: Chris Lilley <Chris.Lilley@sophia.inria.fr>
- Date: Thu, 17 Oct 1996 15:00:51 +0200 (DST)
- To: Jonathan Rosenne <rosenne@NetVision.net.il>, WWW-International List <www-international@w3.org>
On Oct 17, 9:21am, Jonathan Rosenne wrote: > Bert Bos wrote: > > However, there is a problem: a conflict between case-insensitivity and > > allowing non-ASCII characters. > I don't believe there is added value in case-insensitivity this day and > age. [...] I suggest that the class names should be defined as case sensitive. > But there is another problem with internationalized names: UCS defines a > non-unique coding. Some composite characters have at least two valid > representations, the composed character and the base character followed > by diacritics. True, and there are also multiple representations because of the compatibility zone. So for example U+0627 is the arabic letter Aleef, but so is U+FE8D (Aleef isolate) and U+FE8E (Aleef final), the latter two being in the compatibility zone which has all (two or four) contextual forms. Now I concede that the contextual forms are glyph identifiers not characters and have no real business being in a coded character set standard in the first place, but there we are. Another wild gotcha which I discovered while flipping through the Unicode books: U+101A to U+10C5 is the Georgian archaic uppercase alphabet, U+10D0 to U+10F0 is the Georgian archaic lowercase alphabet and the modern Georgian alphabet, which is unicameral (has no case). The Unicode case table says "Note: the modern Georgian alphabet is effectively caseless. Georgian SMALL LETTERs should not be upper cased to CAPITAL LETTERs." Another relevant quote from the Unicode standard, on the subject of case conversion: "Because there are many more lowercase forms than there are upper, it is recommended that the lowercase be used for normalisation rather than the uppercase, such as when strings are case-folded for loose comparison or indexing." -- Chris Lilley, W3C [ http://www.w3.org/ ] Graphics and Fonts Guy The World Wide Web Consortium http://www.w3.org/people/chris/ INRIA, Projet W3C chris@w3.org 2004 Rt des Lucioles / BP 93 +33 93 65 79 87 06902 Sophia Antipolis Cedex, France
Received on Thursday, 17 October 1996 09:01:18 UTC