Re: Internationalized CLASS attributes from Chris Lilley on 1996-10-17 (www-international@w3.org from October to December 1996)

From: Chris Lilley <Chris.Lilley@sophia.inria.fr>
Date: Thu, 17 Oct 1996 15:00:51 +0200 (DST)
To: Jonathan Rosenne <rosenne@NetVision.net.il>, WWW-International List <www-international@w3.org>
Message-Id: <9610171500.ZM26061@grommit.inria.fr>

On Oct 17,  9:21am, Jonathan Rosenne wrote:

> Bert Bos wrote:

> > However, there is a problem: a conflict between case-insensitivity and
> > allowing non-ASCII characters.

> I don't believe there is added value in case-insensitivity this day and
> age. [...] I suggest that the class names should be defined as case
sensitive.

> But there is another problem with internationalized names: UCS defines a
> non-unique coding. Some composite characters have at least two valid
> representations, the composed character and the base character followed
> by diacritics.

True, and there are also multiple representations because of the
compatibility zone. So for example U+0627 is the arabic letter Aleef,
but so is U+FE8D (Aleef isolate) and U+FE8E (Aleef final), the latter
two being in the compatibility zone which has all (two or four)
contextual forms.

Now I concede that the contextual forms are glyph identifiers not
characters and have no real business being in a coded character set
standard in the first place, but there we are.

Another wild gotcha which I discovered while flipping through the
Unicode books:  U+101A to U+10C5 is the Georgian archaic uppercase
alphabet, U+10D0 to U+10F0 is the Georgian archaic lowercase alphabet
and the modern Georgian alphabet, which is unicameral (has no case).

The Unicode case table says "Note: the modern Georgian alphabet is
effectively caseless. Georgian SMALL LETTERs should not be upper
cased to CAPITAL LETTERs."

Another relevant quote from the Unicode standard, on the subject of case
conversion:

"Because there are many more lowercase forms than there are upper, it is
recommended that the lowercase be used for normalisation rather than the
uppercase, such as when strings are case-folded for loose comparison or
indexing."

-- 
Chris Lilley, W3C                          [ http://www.w3.org/ ]
Graphics and Fonts Guy            The World Wide Web Consortium
http://www.w3.org/people/chris/              INRIA,  Projet W3C
chris@w3.org                       2004 Rt des Lucioles / BP 93
+33 93 65 79 87            06902 Sophia Antipolis Cedex, France

Received on Thursday, 17 October 1996 09:01:18 UTC