W3C home > Mailing lists > Public > www-international@w3.org > April to June 2008

Re: 2 many language tags for Norwegian

From: John Cowan <cowan@ccil.org>
Date: Fri, 25 Apr 2008 14:48:58 -0400
To: Leif Halvard Silli <lhs@malform.no>
Cc: John Cowan <cowan@ccil.org>, www-international@w3.org
Message-ID: <20080425184858.GM13977@mercury.ccil.org>

Leif Halvard Silli scripsit:

> Perhaps only a side note, really, but the e-mail message from me to 
> Google the summer 2001, where I asked them for permission to initiate 
> translating Google to Nynorsk, sits in my mailbox. I first had to 
> explain to them about the existence of Nynorsk. Et cetera. (I guess I 
> could have been more on them, from the start, about doing these things 
> right.)

Google's a big place, and the localization folks might not, that long ago,
have been properly hooked up with the language-classification folks.

(If those messages are in English, I'd like to see them, if you wouldn't
mind.)

> And that points to another problem with the no/nb/nn approach: If you 
> start out with 'no' - either because you use it for a only Nynorsk or 
> only Bokmål site, or because you mix both language forms (a newspaper 
> offering articles in both language forms, for instance). Then, at a 
> later moment you decide to offer parallell versions - in nn and nb. And 
> what do you have to do then? Then you must change all the 'no' tags to 
> either 'nn' or 'nb'. It will not be enough to just change either the 
> Nynorsk texts from 'no' to 'no-nyn', and then to let 'no' be used for 
> Bokmål. On must swich *both* to new tags. With the risk that pages and 
> language negotiation breaks, for a short while at least, in the browsers.

Over and over again in standardization we see this happening.  Either
there is one object which is ambiguous, and so standardization introduces
two new objects for the individual meanings, or else there are two objects
with very similar meanings, and standardization introduces a new object
to cover them both.  ("Object" is deliberately very vague here.)

The result is that the problem has become a worse problem, with three
objects in use and people very unclear on how to use them.  Sometimes it's
best to live bad enough alone.

> With the approach I advocates, one could just change 'no' to 'no-NN' or 
> 'no-nyn'. Still one *ought* - in my view - to change the nb version from 
> 'no' to 'no-NB' or 'no-bok'. But at least it will feel more right if the 
> two languages are separated as 'no' versus 'no-nyn' than as 'no' versus 
> 'nn'.

The meat of the argument here is to avoid "no" altogether unless you are
making no distinction.  Unfortunately people don't do that.

> Also, this is in fact how US English is separated from British Englis in 
> many situations: 'en' contains US English while 'en-GB' contains british 
> English. The opposite also exists.

Indeed.  There is no real intelligibility barrier between the two national
varieties, though, except sometimes in matters of lexis.

> However, what about no-nor and en-eng, will such tautologies also be 
> possible via that approach?

No.  No tags of the form xx-xxx will be permitted except the grandfathered
ones that already exist.  Thus for example ar will be Arabic (usually
Standard Arabic but not necessarily), arb (not ar-arb) will be Standard
Arabic specifically, and Egyptian Arabic will be arz (not ar-arz).
The registry will contain information to the effect that arb and arz
(and about 30 others) are encompassed by the macrolanguage ar.

(Note that macrolanguage information, or linguistic relatedness in
general, doesn't directly tell you what the best fallback is.  The best
fallback for Arbereshe Albanian is probably not another variety of
Albanian but Italian; and although Welsh and Breton are closely related
and marginally mutually intelligible, the best fallback for Welsh is
English (except for cy-AR, where it's Spanish) and the best fallback
for Breton is French.)

-- 
Values of beeta will give rise to dom!          John Cowan
(5th/6th edition 'mv' said this if you tried    http://www.ccil.org/~cowan
to rename '.' or '..' entries; see              cowan@ccil.org
http://cm.bell-labs.com/cm/cs/who/dmr/odd.html)
Received on Friday, 25 April 2008 18:49:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:17 GMT