1. Details. >On the other hand, in the list, there are a few items (such as en-us, > >pt-br) that look perfectly fine. What was wrong with them? > > Capitalization. No. (trying for equal brevity) More completely: 102 0.015999% en-us. 122 0.010068% en-us As I said in my original message, the second one has a space at the end. The first one has a period at the end. So both are ill-formed. The same is true of pt-br (extra space at end). 2 Correction A correction: this is actually Accept-Language values, not documents -- we get different results looking at documents. A very interesting point, however, is that the errors here could be corrected if the browsers checked for well-formedness, or at least partial well-formedness, when allowing the user to pick his/er browser's language. That would eliminate a lot of cruft. 3. Guidance for User Agents. This raises a point we should probably have language in 4646bis for. Here's rough text for it; I anticipate that this will generate some discussion ;-) When a user agent, such as a browser, allows users to enter a language tag by typing, the results SHOULD be checked for well-formedness. If the user agent is not regularly updated to the latest registry, it SHOULD NOT require validity, because that could exclude current, valid language tags. It is recommended, however, that the user be notified that the language tag may not be valid. 4. Basic Well-Formedness. We may also want to have the notion of basic well-formedness, which that part of validity which can be checked with a regular expression. The difference is that basic well-formedness doesn't check for multiple singleton extensions. The value of doing this is that (a) it covers 99.999...% of the value of a well-formedness check, and (b) it is a much easier sell to implementers that all they need is a simple regex check. MarkReceived on Sunday, 24 September 2006 20:15:10 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:08 GMT