Re: XHTML/XML comment (case sensitivity an I18N, redux)

On Tue, 1 Feb 2000, Dan Oscarsson wrote:

> >	-- XML WG decisions of Wed. Sep. 10
> >	http://www.w3.org/XML/9712-reports.html#ID40
> 
> Well I looked at the above document and some of the reasons gives some
> thought and some are not right:

They were not listed as reasons, but as "points made".  In fact,
points both pro and con are mentioned.  The *issue*, however, was a
technical one, perhaps unique to SGML itself:

: The Question:
:      Modify the XML specification to achieve the effect of 
:      NAMECASE GENERAL NO in SGML.

SGML provides for a limited amount of "case insensitivity".  This is
controlled through explicitly specified sets of character classes that
are *formally* considered "uppercase" and "lowercase" equivalents of
each other, for the purpose of case *substitution* to "uppercase" by
the parser.  The default sets comprise the characters in the familiar
Roman (ASCII) alphabet; these can be augmented by two extra sets in
1-to-1 correspondence with each other.  It's important to emphasize
that this correspondence is purely formal: e.g. one could specify '@'
as the "uppercase" substitute for "lowercase" '^', if one wanted -
this, btw would define '@' and '^' as "name characters" too.

Maintaining this case substitution feature (NAMECASE GENERAL YES)
involved either making a special case of the Roman alphabet in the
entire Unicode domain, or providing a comprehensive definition of case
substitution maps across this domain - which the received opinion of
Unicode experts suggested was unwise. 

Basically, NAMECASE GENERAL NO was the prudent way to take up arms
against a sea of troubles.

> XHTML says that lower case should be used. But I can see no
> definition on lower case!

The definition is formal, derived from SGML, and fixed in the SGML
declaration profile for XML.

> You cannot define lower case without taking all the problems with
> case insensititity with you, because you still have to define the
> mapping form upper case to lower case.

True, in general, and perhaps the XHTML spec could be clearer.  Its
use of "lowercase" is only in reference to those specific characters
that have been used in a case insensitive fashion previously (i.e.
there is no attempt to invoke a definition of "lowercase" as might
apply to al of Unicode, for instance.)


Arjun

Received on Tuesday, 1 February 2000 15:11:31 UTC