W3C home > Mailing lists > Public > www-international@w3.org > April to June 2008

Re: 2 many language tags for Norwegian

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Fri, 25 Apr 2008 19:49:33 +0200
To: www-international@w3.org
Message-ID: <fut5e6$91k$1@ger.gmane.org>

Leif Halvard Silli wrote:

> If I come to a web site which sends out 'en', with a web browser
> asking for 'en-GB', won't I then recive 'en'? Yes I will.

Hopefully, IIRC Google treated en-GB-oed as unknown (= not "en")
when searching for "en" results.  I can't check it at the moment.

> Thus it is better to come to a web site offering 'no' with a web
> browser asking for 'no-nyn' or 'no-NN' and be offered whatever
> is disguised as 'no, than to be served the page in an entirely
> other langauge.

When some taggers abuse "nn" for unknown it's certainly bad for
real "nn" applications.  FWIW, en-UK also is not what many folks
would think, it's not that "no" is the only language subtag with
such issues.  Some cases of de / gsw / lb / nds / ... can be also
"less than obvious" (putting it as mildly as possible, nothing is
wrong with these subtags).

But you can set your browser to permit nb, nn, and no.  How some 
taggers abuse nn is a matter of education, as for s/UK/GB/g.    

  [region codes]
>> Unfortunately, it cannot be done.
> I am at least glad to hear you say 'unfortunately'.

I'm not.  Region codes are a can of worms, try to find Somaliland,
Kosovo, or SMOM, and you get the tip of the biggest iceberg ever
coming near to Norway... :-)

> Norway are "two different places" when it comes to this 
> particular issue.

With the new RFC 4646 rules those "places" are called "variants",
as you have written nn vs. nb are not really geographic regions.

But after a decision that something is a language you are out of
luck again, a language is no "variant".   The grandfathered tags
no-bok and no-nyn were early RFC 3066 attempts in this direction.

The new rules require that variants consist of 5..8 letters or
digits (or 4 digits without letters).  Won't help you, because
as soon as something is a language it can't be added as variant.

The opposite works, variants or grandfathered predecessors can
be deprecated in favour of an identified language - that's what
happened with no-nyn and no-bok under the old RFC 3066 rules.

The situation might be bad for your purposes, but IMO one thing
is sure, adding more tags would only make it worse.

> what shall we do then?

Educate folks how it works.  There are various participants from
Norway in the relevant ISO committees, and it is simply not okay
if the government site gets it wrong, this is not rocket science.

> perhaps Norwegian should be considerd a "Macro languge", and 
> extended-language tags be taken into use to denote each variant?

Another wormhole, inactive at the moment, hopefully it will never
go live.  What you would get is redundant info, "nb" is shorter
than "no-nb".  Three bytes, not the end of the world even if it
affects billions of pages loaded billions of times for a some TB
of unnecessary traffic.  

But it can be a disaster if the relation has to be modified later, 
because nobody is going to update the billions of pages.  That is
irrelevant for "no", but for an erroneous language subtag "zh" it
is a major headache ("zh" roughly means China and is no language),
so again "not making it worse" is the best solution - if dumping
the complete scheme as failure is ruled out as option, YMMV.

> BCP 47 now says that nb and nn are preferred

Yes, but the old tags are still valid forever, the idea is again
that nobody will ever update billions of old pages.    

> I would like to propose that 'no-nyn' and 'no-bok' was made the
> preferred codes.

Not possible under the current rules, you'd have to convince the
IETF LTRU WG to permit this in a successor of RFC 4646, and then
go to the languages list to request it.  

In theory, in practice it won't fly for obvious reasons:  no-nyn
and no-bok are no ISO 639 language codes, it is better to use 
codes working also outside of RFC 4646 tags, i.e. "nb" and "nn".

If you nevertheless want a fight you could still try to persuade
ISO 639 that "nb" and "nn" are no languages but "dialects", and
that would solve the issue (= folks in Norway would kill you ;-)

 Frank
Received on Friday, 25 April 2008 17:47:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:17 GMT