[Prev][Next][Index][Thread]

Re: Case sensitivity in markup



<bosak@atlantic-83.Eng.Sun.COM> recently wrote:

>From the Unicode Standard 2.0 (July 1996), Section 4.1:

>   Because there are many more lowercase forms than there are
>   uppercase or titlecase, it is recommended that the lowercase form
>   be used for normalization, such as when strings are case-folded
>   for loose comparison or indexing.

Makes sense when "normalization" means "the best case to keep your
master data in, since it's harder to convert other forms to this
one (adding information) rather than vice versa (supressing
information)".  But that doesn't apply here.  If my case normalization
drops accents going to upper case, how can I normalize to lower case?
I don't have an algorithm to regain the correct accents when my parser
must convert the occasional uppercase word back to lower case to
recognize it.

The other examples of conversion problems normalizing to uppercase
suggest that the only universal solution will be to disallow
case normalization.  (Unix forever; down with DOS? :-) )

_Personally_, I don't care.

Dave Peterson
SGMLWorks!

davep@acm.org