- From: Jon Hanna <jon@hackcraft.net>
- Date: Tue, 2 Nov 2004 10:10:20 -0000
- To: "'Elliotte Rusty Harold'" <elharo@metalab.unc.edu>, "'Robin Berjon'" <robin.berjon@expway.fr>
- Cc: <www-tag@w3.org>
> they will. That being said, I'd be reluctant to ask the TAG to > modify the finding to specifically address Java. The best it could do > would be to say "locales are evil, and the best way of using them is > not at all". > > > > I don't think the finding should specifically address Java, either. > However, it needs to be aware of how its wording will be interpreted > by Java programmers, and choose the phrasing accordingly. The current > statement that "Languages are compared case insensitively." will > cause Java programmers to write > value.toUppercase().equals(otherValue) which will fail in some > environments. If instead it were to write, for example, "Values of > the language attribute are compared by first converting the > characters a-z to the characters A-Z and then comparing for string > equality" more Java programmers would get this right. I'm still not > perfectly happy with that wording. I'd like to say that non-ASCII > characters are not changed, but you get the idea. Non-ASCII characters are invalid in that context, but just stating that doesn't solve the locale problem, since i and I are ASCII characters, but changing their case results in İ and ı respectively in Turkic locales. There's also the possibility of a relatively smart Germanic locale's lower-case routine deciding that SS should be lower-cased to ß in a give case (it's slight though, really it would only happen if a spell-check decided that a word should have a ß which wouldn't happen with an RFC 3066 tag. "The letters a-z are interpreted identically to the corresponding letters from the range A-Z. There is a convention for casing depending on whether the characters are from an ISO 639 code, and ISO 3166 code, or a tag registered with IANA, but this convention should not be relied on, as tags which don't follow it are common". BTW, an update to RFC 3066, which allows for information on the script used, is in the works. Regards, Jon Hanna <http://www.selkieweb.com/>
Received on Tuesday, 2 November 2004 10:10:25 UTC