Re: 2 many language tags for Norwegian from Leif Halvard Silli on 2008-04-26 (www-international@w3.org from April to June 2008)

From: Leif Halvard Silli <lhs@malform.no>
Date: Sat, 26 Apr 2008 04:09:02 +0200
To: John Cowan <cowan@ccil.org>
CC: www-international@w3.org
Message-ID: <48128EBE.8020803@malform.no>
John Cowan:

> (If those messages are in English, I'd like to see them, if you wouldn't
> mind.)
>   

If it could be of any help, for a noble goal that I agree with, then I 
guess yes. Please contact me off list.

> > And that points to another problem with the no/nb/nn approach: [...]
>
> Over and over again in standardization we see this happening.  Either
> there is one object which is ambiguous, and so standardization introduces
> two new objects for the individual meanings, or else there are two objects
> with very similar meanings, and standardization introduces a new object
> to cover them both.  ("Object" is deliberately very vague here.)
>
> The result is that the problem has become a worse problem, with three
> objects in use and people very unclear on how to use them.  Sometimes it's
> best to live bad enough alone.
>   

So, we fall into a pattern, after all. ;)

> > With the approach I advocates, one could just change 'no' to 'no-NN' or 
> > 'no-nyn'. Still one *ought* - in my view - to change the nb version from 
> > 'no' to 'no-NB' or 'no-bok'. But at least it will feel more right if the 
> > two languages are separated as 'no' versus 'no-nyn' than as 'no' versus 
> > 'nn'.
>
> The meat of the argument here is to avoid "no" altogether unless you are
> making no distinction.  Unfortunately people don't do that.
>   

Well, if labeling a page as either 'nb' or (especially) 'nn' can 
sometimes lead to the UA being in doubt about what resource to load, it 
seems safer for that reason alone to use 'no'.

But else, it would be very valuable to know if pages are in nynorsk or 
bokmål. (Google appears, in the user interface, to be able to search 
only Nynorsk pages or only Bokmål page. But it doesn't in reality make 
any distinction. It could very easily do so though. It is just a matter 
of knowing which words and forms that mark out Nynorsk vs. Bokmål.)

> > Also, this is in fact how US English is separated from British Englis in 
> > many situations: 'en' contains US English while 'en-GB' contains british 
> > English. The opposite also exists.
>
> Indeed.  There is no real intelligibility barrier between the two national
> varieties, though, except sometimes in matters of lexis.
>
> > However, what about no-nor and en-eng, will such tautologies also be 
> > possible via that approach?
>
> No.  No tags of the form xx-xxx will be permitted except the grandfathered
> ones that already exist.

So when the W3C i18n article [1] said "[...] an extended-language 
subtag. This new subtag will go immediately after the language subtag 
and before any script tag", then this was not accurate information - or, 
at least not accurate as of today? As I understand it now, the new 
extended-langauge subtag will, when used, replace the language subtag.

[1] http://www.w3.org/International/articles/language-tags/#issues
>   Thus for example ar will be Arabic (usually
> Standard Arabic but not necessarily), arb (not ar-arb) will be Standard
> Arabic specifically, and Egyptian Arabic will be arz (not ar-arz).
> The registry will contain information to the effect that arb and arz
> (and about 30 others) are encompassed by the macrolanguage ar.
>   

Having read this, I first thought there is no benefit for my cause in 
the new extended-language subtags. But then, having thought about it, I 
realised that by using 'nbo', then I say that I use a sublanguage of the 
macrolangauge 'no'/'nor'. And ditto if I use 'nno'.

As a consequence, when using e.g. 'nno', then a web browser asking for 
'no', shold get 'nno' if 'no' is unavailable. This is the exact 
behaviour I am after. Likewise, by telling my browser to look for 'nor', 
it should give me both nno and nbo - and perhaps ask me to choose, if 
both are available.

This seems to be an *important* point - and difference from ISO 639-1  - 
with the new extended-language subtags.

Do you agree with my interpretation here?

> (Note that macrolanguage information, or linguistic relatedness in
> general, doesn't directly tell you what the best fallback is.  The best
> fallback for Arbereshe Albanian is probably not another variety of
> Albanian but Italian; and although Welsh and Breton are closely related
> and marginally mutually intelligible, the best fallback for Welsh is
> English (except for cy-AR, where it's Spanish) and the best fallback
> for Breton is French.)
>   

And in Quebec, Candada, then French would be the fallback for English, I 
suppose.

My real issue could perhaps be formulated as: How do we create good 
fallback experiences? Fallback experiences are more important for 
minority langauges that mayority languages.

The good news for forexample Arbereshe Albanian is that it is very 
simple to configure a fallback mechanism there. I suppose there isn't 
even a need to tell that you want *Arbereshe* Albanian. Following the 
rule of thumb to make the language tag as short as possible, it should 
be enough to set the browsers/server to accept/send out Italian and 
Albanian.

Equipped with a web browser that prefers Albanian over Italian, I will 
almost never experience that I do not get Italian when Albanian doesn't 
exist.
-- 
leif halvard silli
Received on Saturday, 26 April 2008 02:09:53 UTC