Re: A note on case sensitivity from Michael Sperberg-McQueen on 1996-10-29 (w3c-sgml-wg@w3.org from October 1996)

From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
Date: Tue, 29 Oct 96 17:39:05 CST
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <199610292353.SAA19721@www10.w3.org>

On Fri, 25 Oct 1996 22:41:12 -0400 <lee@sq.com> said:
>Tim Bray
>> Making XML markup case-sensitive is
>> clearly the *right* thing to do,
>
>I agree.

As do I.  I also think it's the most practicable thing to do.  We
will never have less XML legacy data to worry about than we do now
-- so it will never be easier to introduce case sensitivity
throughout the markup language than it is now.

>> but adds a lot of work for those who
>> want to interoperate with SGML and especially HTML.
>
>Well, the internationalised HTML working group is facing the same issue
>at exactly this moment, and seems to have reached the same conclusion:
>It looks like I18N HTML will have to have NAMECASE NO.

Let's go with them.

>> Failing that, I don't suppose there's any support for going back to
>> 7-bit characters, just for GI's and attribute names?
>
>If it would help, I would support it.

If we *have* to have case folding, which I strenuously dispute, then
the simplest fallback is to use the default case-folding tables of
the Unicode Consortium, which are not hard to get (they come on the
CD when you buy the book, nowadays, and I suspect they're even on
the net somewhere).  As the Unicode Standard Version 2.0 says
(section 4.1, Case, p. 4-2):  "In a few instances, upper- and
lowercase mappings may differ from language to language between
writing systems that employ the same letters.  Examples include
Turkish (... [dotted and dotless I]) and French (...).  However, in
general the vast majority of case mappings are uniform across
languages."

I don't think case-folding is essential to the utility or success of
XML.  Even if it is, though, I don't think it's more important than
internationalization.

Let's all take a deep breath.  Case-sensitive element names,
attribute names, attribute values.  We can live with that.  Element
names restricted to a subset of the alphabet (say, A-J, all
uppercase)?  I couldn't live happily with that, and I can't see
asking the native speakers of every language but English and Latin
to do so.


-C. M. Sperberg-McQueen

Received on Tuesday, 29 October 1996 18:53:20 UTC