- From: Addison Phillips <addison@yahoo-inc.com>
- Date: Thu, 15 Nov 2007 15:17:17 -0800
- To: John Cowan <cowan@ccil.org>
- CC: fantasai <fantasai.lists@inkedblade.net>, www-style@w3.org, "'WWW International'" <www-international@w3.org>
John Cowan wrote: > >> I'd be happy with that if [a-z] and [A-Z] matched each other and didn't >> match anything else. But it seems that's not the case in Unicode. > > Well, looking at http://www.unicode.org/Public/5.0.0/ucd/CaseFolding.txt Yes, that's exactly the reference. I didn't look at the default case folding of U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE), which I should have done. For the record, it's: 0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE This means that WÄ°DTH doesn't match WIDTH or width. > I find that the basic Latin letters do match each other and nothing > else, if you ignore the language-specific foldings, with one exception. > U+212A KELVIN SIGN, which looks exactly like "K" and shouldn't exist > anyhow (it's compatibility equivalent to a proper "K") is case-folded > to "k". I consider that to come under the heading of the Right Thing. Compatibility characters always present a problem of this sort. I think this is also the Right Thing. > > It's also true that some ligatures are case-folded to their spelled out > equivalents: for example, U+FB00 LATIN SMALL LIGATURE FF is case-folded > to simple "ff". > This is actually a Good Thing too. Addison -- Addison Phillips Globalization Architect -- Yahoo! Inc. Chair -- W3C Internationalization Core WG Internationalization is an architecture. It is not a feature.
Received on Thursday, 15 November 2007 23:18:39 UTC