- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 02 Nov 2004 15:28:19 +0900
- To: Chris Lilley <chris@w3.org>, Elliotte Harold <elharo@metalab.unc.edu>
- Cc: Norman Walsh <Norman.Walsh@Sun.COM>, www-tag@w3.org
At 12:30 04/10/27, Chris Lilley wrote: >That would be before the Unicode case folding tables, then. > >EH> In this context, I think English rules make sense, > >These are not 'the English rules'. Unless English somehow acquired >Deseret, Greek, Cyrillic and Armenian while I was not looking. These are >the Universal Character set rules, which are entirely appropriate for >syntactic items like URIs, language tags, and so forth. The Unicode case folding table(s) are appropriate for some cases, but not for others. In particular, they are mainly defined for searching, so they may collapse more than necessary. In the current case, rather than invoking English or Unicode, I think it's best to say that these tags are case-insensitive as defined by RFC 3066 or its successor. I have sent a mail to the authors of http://www.ietf.org/internet-drafts/draft-phillips-langtags-07.txt and the relevant mailing list, and copying the people involved from this thread, so that it can be done in the next draft. As for RFC 3066, the most reasonable thing to do is to assume that by context, when it says "case insensitive", it means "case insensivite as usually used for US-ASCII only" or whatever exact wording you prefer. There is absolutely no doubt at all that every participant in the creation and discussion of RFC 3066 was always assuming this, to the extent that none of them thought about writing it down. As for English, English isn't just US-ASCII. If you look at a good dictionary, you'll see that it occasionally includes words with diacritics. Regards, Martin.
Received on Tuesday, 2 November 2004 06:30:37 UTC