W3C home > Mailing lists > Public > www-international@w3.org > April to June 2008

Re: 2 many language tags for Norwegian

From: Leif Halvard Silli <lhs@malform.no>
Date: Sat, 26 Apr 2008 02:47:08 +0200
Message-ID: <48127B8C.6070907@malform.no>
To: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
CC: www-international@w3.org

Frank Ellermann:
> Leif Halvard Silli wrote:
>
> > If I come to a web site which sends out 'en', with a web browser
> > asking for 'en-GB', won't I then recive 'en'? Yes I will.
>
> Hopefully, IIRC Google treated en-GB-oed as unknown (= not "en")
> when searching for "en" results.  I can't check it at the moment.
>   

I am uncertain what the oed (Oxford English Dictionary spelling) tag is 
for.

But to examplify what I talked about: On my Apache installation, when I 
set Firefox to prefer 'en' and insert "AddLanguage en-GB .en-gb" in my 
Apache configuration file, and if the only available English version is 
index.html.en-gb, then Firefox will get and open that file.

This is the key thing. How can we get this to happen for Bokmål and 
Nynorsk?

If it is possible to get the browser to a) send out that it prefers 
'nn', while b) at the same time get it to fall back to no or nb if nn 
isn't awailable.

It should be simple. When I select 'de', then I get de-AT if nothing 
else in German is awailable. When selecting 'en' then en-GB get served 
if nothing else is available in English.


> When some taggers abuse "nn" for unknown it's certainly bad for
> real "nn" applications.

I did not encounter such a use of nn, myself.

>   FWIW, en-UK also is not what many folks
> would think, it's not that "no" is the only language subtag with
> such issues.  Some cases of de / gsw / lb / nds / ... can be also
> "less than obvious" (putting it as mildly as possible, nothing is
> wrong with these subtags).
>   

The point I made about 'nb' and 'nn' not being "obvious" to Norwegians, 
was mostly a sidepoint.  In fact, I proposed to resuse 'NB' and 'NN' as 
region nammes - even if more obvious (for Norwegians) names exist.

> But you can set your browser to permit nb, nn, and no.  How some 
> taggers abuse nn is a matter of education, as for s/UK/GB/g.    
>   

I can set *my* browser to permit nb, nn and no. But not any browser. Not 
on OS X, at least. On OS X, the browser (Safari/Webkit and those that 
interact with the system - Camino/Opera) only sends out one accept 
language header. Thus, if I want to *prefer* Nynorsk, I must place it as 
the first preference in the language list of OSX. Then, AFAIK, those 
browsers will ask for Nynorsk, and only Nynorsk. (And unless I place 
Nynorsk on top, then I cannot have OS X prefer Nynorsk interfaces of 
applications.)

When I set *my* browser to prefer nn,no,nb - in that order - and visit a 
web site running Apache, offering Norwegiang then it happens that I get 
the page in English.  This is not strange, because when I look inside 
Apache 1.3 on my Mac, then it has two AddLanguage options "ready": 'no' 
for Bokmål and 'nn' for nynorsk.

I can't set 'no' on top in my browser either, because then I will not be 
sereved 'nn' whenever a page exist as 'nn' and 'no'.

> > Norway are "two different places" when it comes to this 
> > particular issue.
>
> With the new RFC 4646 rules those "places" are called "variants",
> as you have written nn vs. nb are not really geographic regions.
>   

It seems to me that the variants are mostly meant for language variants 
used in sub-regions.

Nynorsk and Bokmål represents two different approaches to standardising 
the Norwegian language. As such, their differences could be somewhat 
compared to the fight between the two modern Greek norms. Both cover the 
entire Greece.

[1] http://www.w3.org/International/articles/language-tags/#issues

> But after a decision that something is a language you are out of
> luck again, a language is no "variant".

The "decision that something is a language" did not need to mean that we 
should have both nn, nb and no. We could have had only no. 

Politically though, in Norway, nb and nn represents forms of the same 
language. Linguistically they are perhaps different languages, worthy a 
nb and a nn.  Regardless, the focus is on "equal rights" and  in keeping 
the Norwegian language law. THus the political dimension is most important.

So what we need is something that works with the political understanding 
of what Nynorsk and Bokmål are.

>    The grandfathered tags
> no-bok and no-nyn were early RFC 3066 attempts in this direction.  [.. snip ...]
>   

An attemt of the "variant" direction? Seems more like an early attemt on 
the macro language/language extension direction.

Adding more tags would be bad, you said. I wish they had had the wisdom 
to say so when they proposed nb and nn, as we allready had no-nyn and 
no-bok.


[...]
> > what shall we do then?
>
> Educate folks how it works.  There are various participants from
> Norway in the relevant ISO committees, and it is simply not okay
> if the government site gets it wrong, this is not rocket science.
>   

I have looked into this several times. And the rules confuses even me. 
(My first though, when looking at the goverment web site, was to advice 
them to use 'no-nn'  ...)

> > perhaps Norwegian should be considerd a "Macro languge", and 
> > extended-language tags be taken into use to denote each variant?
>
> Another wormhole, inactive at the moment, hopefully it will never
> go live.

It sounds from others as this is a powerless wish.

>   What you would get is redundant info, "nb" is shorter
> than "no-nb".  Three bytes, not the end of the world even if it
> affects billions of pages loaded billions of times for a some TB
> of unnecessary traffic.  
>   

I could imagine that it was thoughts like this that caused the "simple" 
solution we now have.

> But it can be a disaster if the relation has to be modified later, 
> because nobody is going to update the billions of pages.  That is
> irrelevant for "no", but for an erroneous language subtag "zh" it
> is a major headache ("zh" roughly means China and is no language),
> so again "not making it worse" is the best solution - if dumping
> the complete scheme as failure is ruled out as option, YMMV.
>   

Why would it have to be modified later? 'no' is always right - wheter I 
write Nynorsk or Bokmål. The hyphotetical no-nb would not be more right 
or wrong than the current 'nb' allready is. So the hyphotetical need for 
re-tagging would be the same.

> > I would like to propose that 'no-nyn' and 'no-bok' was made the
> > preferred codes.
>
> Not possible under the current rules, you'd have to convince the
> IETF LTRU WG to permit this in a successor of RFC 4646, and then
> go to the languages list to request it.  
>
> In theory, in practice it won't fly for obvious reasons:  no-nyn
> and no-bok are no ISO 639 language codes, it is better to use 
> codes working also outside of RFC 4646 tags, i.e. "nb" and "nn".
>
> If you nevertheless want a fight you could still try to persuade
> ISO 639 that "nb" and "nn" are no languages but "dialects", and
> that would solve the issue (= folks in Norway would kill you ;-)
>   

It is the opposite of what you say that is the problem: Bokmål is using 
'no', which is also most compatible. While Nynorsk is placed in the cold 
as 'dialecet' with 'nn'. (By the way, if we wanted to be as compatible 
as possible, then I think we could device 'nb' for Bokmål and 'no' for 
Nynorsk, because, it seems usually 'nb' is preferred over 'no', if both 
are available.)

Allready ISO 639-3 says that 'no' is a macrolanguage, and that the 
concrete manifestations of 'no' are 'nno' and 'nbo'. And no one has been 
killed because of this. We must also assume that ISO 639-3 and ISO 639-1 
do not contradict each other.

What is important is that users get the best user experiene, and that 
the tags can be used to implement the official language politics for the 
Norwegian language. The tags should not promote the "ousting" of the one 
or the other language from being considered Norwegian.

In that regard, guess which language tag the Norwegian goverment website 
uses for Bokmål?

Yeah, right. It uses 'no'.
-- 
leif halvard silli
Received on Saturday, 26 April 2008 00:47:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:17 GMT