- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Fri, 25 Apr 2008 19:49:33 +0200
- To: www-international@w3.org
Leif Halvard Silli wrote: > If I come to a web site which sends out 'en', with a web browser > asking for 'en-GB', won't I then recive 'en'? Yes I will. Hopefully, IIRC Google treated en-GB-oed as unknown (= not "en") when searching for "en" results. I can't check it at the moment. > Thus it is better to come to a web site offering 'no' with a web > browser asking for 'no-nyn' or 'no-NN' and be offered whatever > is disguised as 'no, than to be served the page in an entirely > other langauge. When some taggers abuse "nn" for unknown it's certainly bad for real "nn" applications. FWIW, en-UK also is not what many folks would think, it's not that "no" is the only language subtag with such issues. Some cases of de / gsw / lb / nds / ... can be also "less than obvious" (putting it as mildly as possible, nothing is wrong with these subtags). But you can set your browser to permit nb, nn, and no. How some taggers abuse nn is a matter of education, as for s/UK/GB/g. [region codes] >> Unfortunately, it cannot be done. > I am at least glad to hear you say 'unfortunately'. I'm not. Region codes are a can of worms, try to find Somaliland, Kosovo, or SMOM, and you get the tip of the biggest iceberg ever coming near to Norway... :-) > Norway are "two different places" when it comes to this > particular issue. With the new RFC 4646 rules those "places" are called "variants", as you have written nn vs. nb are not really geographic regions. But after a decision that something is a language you are out of luck again, a language is no "variant". The grandfathered tags no-bok and no-nyn were early RFC 3066 attempts in this direction. The new rules require that variants consist of 5..8 letters or digits (or 4 digits without letters). Won't help you, because as soon as something is a language it can't be added as variant. The opposite works, variants or grandfathered predecessors can be deprecated in favour of an identified language - that's what happened with no-nyn and no-bok under the old RFC 3066 rules. The situation might be bad for your purposes, but IMO one thing is sure, adding more tags would only make it worse. > what shall we do then? Educate folks how it works. There are various participants from Norway in the relevant ISO committees, and it is simply not okay if the government site gets it wrong, this is not rocket science. > perhaps Norwegian should be considerd a "Macro languge", and > extended-language tags be taken into use to denote each variant? Another wormhole, inactive at the moment, hopefully it will never go live. What you would get is redundant info, "nb" is shorter than "no-nb". Three bytes, not the end of the world even if it affects billions of pages loaded billions of times for a some TB of unnecessary traffic. But it can be a disaster if the relation has to be modified later, because nobody is going to update the billions of pages. That is irrelevant for "no", but for an erroneous language subtag "zh" it is a major headache ("zh" roughly means China and is no language), so again "not making it worse" is the best solution - if dumping the complete scheme as failure is ruled out as option, YMMV. > BCP 47 now says that nb and nn are preferred Yes, but the old tags are still valid forever, the idea is again that nobody will ever update billions of old pages. > I would like to propose that 'no-nyn' and 'no-bok' was made the > preferred codes. Not possible under the current rules, you'd have to convince the IETF LTRU WG to permit this in a successor of RFC 4646, and then go to the languages list to request it. In theory, in practice it won't fly for obvious reasons: no-nyn and no-bok are no ISO 639 language codes, it is better to use codes working also outside of RFC 4646 tags, i.e. "nb" and "nn". If you nevertheless want a fight you could still try to persuade ISO 639 that "nb" and "nn" are no languages but "dialects", and that would solve the issue (= folks in Norway would kill you ;-) Frank
Received on Friday, 25 April 2008 17:47:39 UTC