W3C home > Mailing lists > Public > www-international@w3.org > April to June 2008

Re: 2 many language tags for Norwegian

From: Leif Halvard Silli <lhs@malform.no>
Date: Sat, 26 Apr 2008 05:58:56 +0200
Message-ID: <4812A880.3070603@malform.no>
To: John Cowan <cowan@ccil.org>
CC: www-international@w3.org

John Cowan:   ­  
> Leif Halvard Silli scripsit: ...
> > But else, it would be very valuable to know if pages are in nynorsk or 
> > bokmål. (Google appears, in the user interface, to be able to search 
> > only Nynorsk pages or only Bokmål page. But it doesn't in reality make 
> > any distinction. It could very easily do so though. It is just a matter 
> > of knowing which words and forms that mark out Nynorsk vs. Bokmål.)
>
> I don't think it reveals any material nonpublic facts to say that: 
>   

> [...] 3) Only certain existing language tags are useful in this process (for
> example, "en" is worth nothing, 

'not worth nothing', I guess you meant.

> because a huge fraction of non-English
> content is mechanically tagged "en" by broken HTML composers, HTTP
> servers, etc.);
>
> I don't know what criteria Google uses to decide which languages are
> cost-effective to detect.
>   

One important criteria is certainly AdWords. If Google had offered 
AdWords in Nynorsk, then a) it would have been good for Nynorsk. b) They 
would have tagged pages as Nynorsk.

> > So when the W3C i18n article [1] said "[...] an extended-language 
> > subtag. This new subtag will go immediately after the language subtag 
> > and before any script tag", then this was not accurate information - or, 
> > at least not accurate as of today?
>
> It was for many years the plan, but compelling arguments induced the LTRU
>   

Such as?

> WG to abandon the plan and treat all languages as syntactically equal:
> each language and macrolanguage is represented directly by a 2-letter or
> 3-letter language subtag, and extended-language subtags will not be used.
>
> However, if there is a 2-letter subtag for a language or macrolanguage,
> it will be used in preference to the 3-letter form.  So 'nno', 'nbo',
> and 'nor' will never be valid BCP 47 language subtags.
>   

Gotcha.

> > Having read this, I first thought there is no benefit for my cause in 
> > the new extended-language subtags. But then, having thought about it, I 
> > realised that by using 'nbo', then I say that I use a sublanguage of the 
> > macrolangauge 'no'/'nor'. And ditto if I use 'nno'.
>
> And you say the same thing (only conformantly) if you use 'nb' and 'nn'.
>
> > As a consequence, when using e.g. 'nno', then a web browser asking for 
> > 'no', shold get 'nno' if 'no' is unavailable. This is the exact 
> > behaviour I am after. Likewise, by telling my browser to look for 'nor', 
> > it should give me both nno and nbo - and perhaps ask me to choose, if 
> > both are available.
>
> Changing to different (and invalid)

What do you mean by 'invalid'? Not 'no-nyn' and 'no-bok', I suppose? (I 
have not advocated use of tags not part of BCP 47.)

>  tags doesn't change the story.
> If you want nn and nb in that order of preference, set your browser
> to ask for nn, no, and nb in that order.
>   

Somewhere the relationship between nn, no, nb must be better specified.

> > And in Quebec, Candada, then French would be the fallback for English, I 
> > suppose.
>
> It all depends.  Anyhow, I was trying to use examples that aren't
> politically controversial.
>   

So did I. I thought I offered an uncontroversal example. The government 
of Quebec uses French, I believe. And thus it uses French as 
administration language in that state. So far, no controversy, right?

 If a citisen reads English version goverment documens and there 
suddenly aren't a English version of the next document, then I think 
that citizen would be glad to be offerd the French version instead.

Whether he will - or is able - to read them, is another issue. Which 
doesn't affect the status of French as fallback in Quebec.

Though, still being Quebec, reading info from the central goverment, as 
a French speaker, you would of course be happy to receive English if a 
certain document was unavailable in French. (But I guess this is 
controversial as well?)

But of course, it all depends. One can always configure the browser or 
act actively against the current status. And it often seems very 
"controversal" if a majority persons in some context suddenly is the 
minority.

> > The good news for forexample Arbereshe Albanian is that it is very 
> > simple to configure a fallback mechanism there. I suppose there isn't 
> > even a need to tell that you want *Arbereshe* Albanian. Following the 
> > rule of thumb to make the language tag as short as possible, it should 
> > be enough to set the browsers/server to accept/send out Italian and 
> > Albanian.
>
> Well, no.  The idea in that case is that if you know Arbereshe Albanian,
> you probably can't understand standard Albanian at all, or only very
> poorly.  However, you are almost certainly fully bilingual in Italian.
>   

Well, yes, I'd say, again. Where will you find web sites which offers 
Albanian and Italian in paralell? OK, I forgot the obvious: Google, etc. 
True, it would not work for those sites. So, yes, there you are right. 
Though it depends. I know persons who prefer Swedish over Bokmål.
-- 
leif halvard silli
Received on Saturday, 26 April 2008 03:59:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:17 GMT