Re: Case sensitivity in markup from DAVEP@acm.org on 1996-11-04 (w3c-sgml-wg@w3.org from November 1996)

From: <DAVEP@acm.org>
Date: Sun, 03 Nov 1996 22:38:11 -0600 (CDT)
To: bosak@atlantic-83.Eng.Sun.COM
Cc: W3C-SGML-WG@w3.org
Message-id: <01IBFBJST0S60080MC@PASCAL.ACM.ORG>

<bosak@atlantic-83.Eng.Sun.COM> recently wrote:

>From the Unicode Standard 2.0 (July 1996), Section 4.1:

>   Because there are many more lowercase forms than there are
>   uppercase or titlecase, it is recommended that the lowercase form
>   be used for normalization, such as when strings are case-folded
>   for loose comparison or indexing.

Makes sense when "normalization" means "the best case to keep your
master data in, since it's harder to convert other forms to this
one (adding information) rather than vice versa (supressing
information)".  But that doesn't apply here.  If my case normalization
drops accents going to upper case, how can I normalize to lower case?
I don't have an algorithm to regain the correct accents when my parser
must convert the occasional uppercase word back to lower case to
recognize it.

The other examples of conversion problems normalizing to uppercase
suggest that the only universal solution will be to disallow
case normalization.  (Unix forever; down with DOS? :-) )

_Personally_, I don't care.

Dave Peterson
SGMLWorks!

davep@acm.org

Received on Sunday, 3 November 1996 23:38:15 UTC