W3C home > Mailing lists > Public > www-international@w3.org > April to June 2008

Re: 2 many language tags for Norwegian

From: John Cowan <cowan@ccil.org>
Date: Fri, 25 Apr 2008 23:01:19 -0400
To: Leif Halvard Silli <lhs@malform.no>
Cc: John Cowan <cowan@ccil.org>, www-international@w3.org
Message-ID: <20080426030119.GB24609@mercury.ccil.org>

Leif Halvard Silli scripsit:

> Well, if labeling a page as either 'nb' or (especially) 'nn' can 
> sometimes lead to the UA being in doubt about what resource to load, it 
> seems safer for that reason alone to use 'no'.

Yes.  But as you say, if you are going to have separate resources,
you should tag them nb and nn.

> But else, it would be very valuable to know if pages are in nynorsk or 
> bokmål. (Google appears, in the user interface, to be able to search 
> only Nynorsk pages or only Bokmål page. But it doesn't in reality make 
> any distinction. It could very easily do so though. It is just a matter 
> of knowing which words and forms that mark out Nynorsk vs. Bokmål.)

I don't think it reveals any material nonpublic facts to say that:

1) Google tags every page with a language tag as it indexes it;

2) The language tag is derived from a variety of indicia, including
characters in use, letter statistics, keywords, and any existing
language tag;

3) Only certain existing language tags are useful in this process (for
example, "en" is worth nothing, because a huge fraction of non-English
content is mechanically tagged "en" by broken HTML composers, HTTP
servers, etc.);

I don't know what criteria Google uses to decide which languages are
cost-effective to detect.

> So when the W3C i18n article [1] said "[...] an extended-language 
> subtag. This new subtag will go immediately after the language subtag 
> and before any script tag", then this was not accurate information - or, 
> at least not accurate as of today?

It was for many years the plan, but compelling arguments induced the LTRU
WG to abandon the plan and treat all languages as syntactically equal:
each language and macrolanguage is represented directly by a 2-letter or
3-letter language subtag, and extended-language subtags will not be used.

However, if there is a 2-letter subtag for a language or macrolanguage,
it will be used in preference to the 3-letter form.  So 'nno', 'nbo',
and 'nor' will never be valid BCP 47 language subtags.

> Having read this, I first thought there is no benefit for my cause in 
> the new extended-language subtags. But then, having thought about it, I 
> realised that by using 'nbo', then I say that I use a sublanguage of the 
> macrolangauge 'no'/'nor'. And ditto if I use 'nno'.

And you say the same thing (only conformantly) if you use 'nb' and 'nn'.

> As a consequence, when using e.g. 'nno', then a web browser asking for 
> 'no', shold get 'nno' if 'no' is unavailable. This is the exact 
> behaviour I am after. Likewise, by telling my browser to look for 'nor', 
> it should give me both nno and nbo - and perhaps ask me to choose, if 
> both are available.

Changing to different (and invalid) tags doesn't change the story.
If you want nn and nb in that order of preference, set your browser
to ask for nn, no, and nb in that order.

> And in Quebec, Candada, then French would be the fallback for English, I 
> suppose.

It all depends.  Anyhow, I was trying to use examples that aren't
politically controversial.

> The good news for forexample Arbereshe Albanian is that it is very 
> simple to configure a fallback mechanism there. I suppose there isn't 
> even a need to tell that you want *Arbereshe* Albanian. Following the 
> rule of thumb to make the language tag as short as possible, it should 
> be enough to set the browsers/server to accept/send out Italian and 
> Albanian.

Well, no.  The idea in that case is that if you know Arbereshe Albanian,
you probably can't understand standard Albanian at all, or only very
poorly.  However, you are almost certainly fully bilingual in Italian.

-- 
And it was said that ever after, if any                 John Cowan
man looked in that Stone, unless he had a               cowan@ccil.org
great strength of will to turn it to other              http://ccil.org/~cowan
purpose, he saw only two aged hands withering
in flame.   --"The Pyre of Denethor"
Received on Saturday, 26 April 2008 03:02:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:17 GMT