W3C home > Mailing lists > Public > www-international@w3.org > April to June 2008

Re: 2 many language tags for Norwegian

From: Leif Halvard Silli <lhs@malform.no>
Date: Fri, 25 Apr 2008 19:26:33 +0200
Message-ID: <48121449.4070203@malform.no>
To: John Cowan <cowan@ccil.org>
CC: www-international@w3.org

John Cowan 25-04-08 16:23:   ­  
> Leif Halvard Silli scripsit:
>
> >    * The reason why Google uses 'no' for 'nb', instad of 'nb' for 'nb',
> >      is not the one you mention.
>
> How do you know this?
>
> Employees of both Google and Yahoo made such comments on LTRU WG telcons.
>   

Perhaps only a side note, really, but the e-mail message from me to 
Google the summer 2001, where I asked them for permission to initiate 
translating Google to Nynorsk, sits in my mailbox. I first had to 
explain to them about the existence of Nynorsk. Et cetera. (I guess I 
could have been more on them, from the start, about doing these things 
right.)

I have not been on those telcons. But the "weight" argument can of 
course be used everywhere, in any context. "We choose to only really 
care for those that count for most people." There is no particular logic 
to it. The outcome can be good or bad, logical or illogical. Therefore 
it matters how the options look like. And they are confusing and do not 
enourage perfect solutions.

Otherwise, I told you in message how I know this. Google began with 'nb' 
first.

And that points to another problem with the no/nb/nn approach: If you 
start out with 'no' - either because you use it for a only Nynorsk or 
only Bokmål site, or because you mix both language forms (a newspaper 
offering articles in both language forms, for instance). Then, at a 
later moment you decide to offer parallell versions - in nn and nb. And 
what do you have to do then? Then you must change all the 'no' tags to 
either 'nn' or 'nb'. It will not be enough to just change either the 
Nynorsk texts from 'no' to 'no-nyn', and then to let 'no' be used for 
Bokmål. On must swich *both* to new tags. With the risk that pages and 
language negotiation breaks, for a short while at least, in the browsers.

Or take English instead: You want to offer a GB versioin of a certain 
resource. Then you just add en-GB to the related pages. You do not need 
to change the other pages as well.

Another example is Wikipedia, who uses the language tags as part of the 
URL to the language versions. Norwegian Wikipedia started out as a 
common Norwegian project, on the address no.wikipedia.org. Later one 
decided to separate into Nynorsk and Bokmål. Then Bokmål remained on 
said address. While Nynorsk moved to nn.wikipedia.org.

With the approach I advocates, one could just change 'no' to 'no-NN' or 
'no-nyn'. Still one *ought* - in my view - to change the nb version from 
'no' to 'no-NB' or 'no-bok'. But at least it will feel more right if the 
two languages are separated as 'no' versus 'no-nyn' than as 'no' versus 
'nn'.

Also, this is in fact how US English is separated from British Englis in 
many situations: 'en' contains US English while 'en-GB' contains british 
English. The opposite also exists.

(This approach would not quite solve anything for Wikipedia, though. 
Guess they would have to use 3-letter codes to come any way.)

> > When Norwegians themselves, and messages I get from forreigners trying 
> > to understand the Norwegian codes, show that they are not understood, 
> > what shall we do then? Be held hostage of U.N. Statistics Division, who 
> > has developed those codes for entirely other purposes?
>
> The alternatives are: use the somewhat inappropriate ISO 3166 standard,
> with its dependence on distinctions that are partly political and
> partly economic; or develop our own, and immediately get caught up in
> a never-ending debate.
>
> > Or perhaps Norwegian should be considerd a "Macro languge",  
>
> That is so in ISO 639-3.
>   

Not sure what you mean. How can I read into/out of ISO 639-3 that 
Norwegian is considered a macro languge for Nynorsk and Bokmål?

Or did you mean to say that, given the macro language approch, we should 
choose the 3-letter codes from ISO 639-3 as extended-language subtags. 
Thus we would have:

    no-nno for Norwegian nynorsk
    no-nbo for NOrwegian bokmål

> > and extended-language tags be taken into use to denote each variant? The 
> > 'no-bok' and 'no-nyn' fits perfectly in to that picture, don't they? 
>
> In a very hard-fought decision, LTRU decided not to go forward with
> extended language subtags, but to use ISO 639-3 code elements directly
> as language subtags for all languages.
>   

Ok, now I understand better what the article 'Language Tags in HTML and 
XML' says about this.

* http://www.w3.org/International/articles/language-tags/#issues

However, what about no-nor and en-eng, will such tautologies also be 
possible via that approach?

> > BCP 47 now says that nb and nn are preferred and that they "replaced" 
> > no-nyn and no-bok.
>
> Preferred, yes.  But once a tag is valid, it remains valid forever in
> the same meaning: that's a basic rule of BCP 47.
>
> > But if it is as you say, then I would like to propose that 'no-nyn' and 
> > 'no-bok' was made the preferred codes.
>
> At the moment, ietf-languages doesn't have the authority to prefer
> an older (and irregular) tag to a newer ISO equivalents when it
> becomes available.  If you want to change that, post to ltru@ietf.org;
> this is about the last possible moment to do so.  Note that you need
> to propose actual text (you can find the current editorial draft at
> http://inter-locale.com/ID/draft-ietf-ltru-4646bis-13.html ) and you
> need to speak to the *general* issue of allowing ietf-languages to decide
> whether an existing tag should be preferred to a new one, something they
> currently have no discretion on.
>   

Thanks for this info, which seems to be very accurate and realistic. I 
guess just have to start looking at it.
-- 
Leif Halvard Silli
Received on Friday, 25 April 2008 17:27:22 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:17 GMT