[Prev][Next][Index][Thread]

Re: A note on case sensitivity



At 12:58 AM 25/10/96 EDT, lee@sq.com wrote:

>Case insensitivity is not well defined for Unicode/ISO 10646 as a whole
>and really only makes sense when you have a specific language -- but
>cross references from a French section to a Swedish section of a
>document might then have different rules for case sensitivity
>(whether accents are retaind in upper case, for example).

This is one of the downsides of allowing ISO 10646 for attribute and element
names
that has to be faced: either you make naming conventions locale specific or
you drop case matching. I feel that the advantages that XML could offer by
providing for a full range of 10646 characters for naming, etc, more than
outway the disadvantage of having to be case sensitive in element and
attribute names, etc.

For the ERN extensions for SGML another pair of classes, NAMESTRT and
NAMECHAR, was introduced for languages where there is no equivalence between
uppercase and
lowercase (e.g. CJK languages). Whilst this covers a large number of
languages it does not cover the Quebec/France variants lee mentioned. This
example is one reason why the concept of document specific case rules
becomes important in multilingual document sets.

Swedisn and French, for example, should not be in the same "document" but
should be separate linked segments of a document with their own character
set case rules. The question is how do you link them together. In SGML they
cannot be subdocuments of a master document because subdocs must share an
SGML declaration with the calling document. This is one of the "unnecessary"
restrictions that SGML97 should address.

----
Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK 
Phone/Fax: +44 1452 714029   WWW home page: http://www.u-net.com/~sgml/



Follow-Ups: