- From: <DAVEP@acm.org>
- Date: Sun, 03 Nov 1996 22:38:11 -0600 (CDT)
- To: bosak@atlantic-83.Eng.Sun.COM
- Cc: W3C-SGML-WG@w3.org
<bosak@atlantic-83.Eng.Sun.COM> recently wrote: >From the Unicode Standard 2.0 (July 1996), Section 4.1: > Because there are many more lowercase forms than there are > uppercase or titlecase, it is recommended that the lowercase form > be used for normalization, such as when strings are case-folded > for loose comparison or indexing. Makes sense when "normalization" means "the best case to keep your master data in, since it's harder to convert other forms to this one (adding information) rather than vice versa (supressing information)". But that doesn't apply here. If my case normalization drops accents going to upper case, how can I normalize to lower case? I don't have an algorithm to regain the correct accents when my parser must convert the occasional uppercase word back to lower case to recognize it. The other examples of conversion problems normalizing to uppercase suggest that the only universal solution will be to disallow case normalization. (Unix forever; down with DOS? :-) ) _Personally_, I don't care. Dave Peterson SGMLWorks! davep@acm.org
Received on Sunday, 3 November 1996 23:38:15 UTC