- From: Leif Halvard Silli <lhs@malform.no>
- Date: Fri, 25 Apr 2008 19:26:33 +0200
- To: John Cowan <cowan@ccil.org>
- CC: www-international@w3.org
John Cowan 25-04-08 16:23:
> Leif Halvard Silli scripsit:
>
> > * The reason why Google uses 'no' for 'nb', instad of 'nb' for 'nb',
> > is not the one you mention.
>
> How do you know this?
>
> Employees of both Google and Yahoo made such comments on LTRU WG telcons.
>
Perhaps only a side note, really, but the e-mail message from me to
Google the summer 2001, where I asked them for permission to initiate
translating Google to Nynorsk, sits in my mailbox. I first had to
explain to them about the existence of Nynorsk. Et cetera. (I guess I
could have been more on them, from the start, about doing these things
right.)
I have not been on those telcons. But the "weight" argument can of
course be used everywhere, in any context. "We choose to only really
care for those that count for most people." There is no particular logic
to it. The outcome can be good or bad, logical or illogical. Therefore
it matters how the options look like. And they are confusing and do not
enourage perfect solutions.
Otherwise, I told you in message how I know this. Google began with 'nb'
first.
And that points to another problem with the no/nb/nn approach: If you
start out with 'no' - either because you use it for a only Nynorsk or
only Bokmål site, or because you mix both language forms (a newspaper
offering articles in both language forms, for instance). Then, at a
later moment you decide to offer parallell versions - in nn and nb. And
what do you have to do then? Then you must change all the 'no' tags to
either 'nn' or 'nb'. It will not be enough to just change either the
Nynorsk texts from 'no' to 'no-nyn', and then to let 'no' be used for
Bokmål. On must swich *both* to new tags. With the risk that pages and
language negotiation breaks, for a short while at least, in the browsers.
Or take English instead: You want to offer a GB versioin of a certain
resource. Then you just add en-GB to the related pages. You do not need
to change the other pages as well.
Another example is Wikipedia, who uses the language tags as part of the
URL to the language versions. Norwegian Wikipedia started out as a
common Norwegian project, on the address no.wikipedia.org. Later one
decided to separate into Nynorsk and Bokmål. Then Bokmål remained on
said address. While Nynorsk moved to nn.wikipedia.org.
With the approach I advocates, one could just change 'no' to 'no-NN' or
'no-nyn'. Still one *ought* - in my view - to change the nb version from
'no' to 'no-NB' or 'no-bok'. But at least it will feel more right if the
two languages are separated as 'no' versus 'no-nyn' than as 'no' versus
'nn'.
Also, this is in fact how US English is separated from British Englis in
many situations: 'en' contains US English while 'en-GB' contains british
English. The opposite also exists.
(This approach would not quite solve anything for Wikipedia, though.
Guess they would have to use 3-letter codes to come any way.)
> > When Norwegians themselves, and messages I get from forreigners trying
> > to understand the Norwegian codes, show that they are not understood,
> > what shall we do then? Be held hostage of U.N. Statistics Division, who
> > has developed those codes for entirely other purposes?
>
> The alternatives are: use the somewhat inappropriate ISO 3166 standard,
> with its dependence on distinctions that are partly political and
> partly economic; or develop our own, and immediately get caught up in
> a never-ending debate.
>
> > Or perhaps Norwegian should be considerd a "Macro languge",
>
> That is so in ISO 639-3.
>
Not sure what you mean. How can I read into/out of ISO 639-3 that
Norwegian is considered a macro languge for Nynorsk and Bokmål?
Or did you mean to say that, given the macro language approch, we should
choose the 3-letter codes from ISO 639-3 as extended-language subtags.
Thus we would have:
no-nno for Norwegian nynorsk
no-nbo for NOrwegian bokmål
> > and extended-language tags be taken into use to denote each variant? The
> > 'no-bok' and 'no-nyn' fits perfectly in to that picture, don't they?
>
> In a very hard-fought decision, LTRU decided not to go forward with
> extended language subtags, but to use ISO 639-3 code elements directly
> as language subtags for all languages.
>
Ok, now I understand better what the article 'Language Tags in HTML and
XML' says about this.
* http://www.w3.org/International/articles/language-tags/#issues
However, what about no-nor and en-eng, will such tautologies also be
possible via that approach?
> > BCP 47 now says that nb and nn are preferred and that they "replaced"
> > no-nyn and no-bok.
>
> Preferred, yes. But once a tag is valid, it remains valid forever in
> the same meaning: that's a basic rule of BCP 47.
>
> > But if it is as you say, then I would like to propose that 'no-nyn' and
> > 'no-bok' was made the preferred codes.
>
> At the moment, ietf-languages doesn't have the authority to prefer
> an older (and irregular) tag to a newer ISO equivalents when it
> becomes available. If you want to change that, post to ltru@ietf.org;
> this is about the last possible moment to do so. Note that you need
> to propose actual text (you can find the current editorial draft at
> http://inter-locale.com/ID/draft-ietf-ltru-4646bis-13.html ) and you
> need to speak to the *general* issue of allowing ietf-languages to decide
> whether an existing tag should be preferred to a new one, something they
> currently have no discretion on.
>
Thanks for this info, which seems to be very accurate and realistic. I
guess just have to start looking at it.
--
Leif Halvard Silli
Received on Friday, 25 April 2008 17:27:22 UTC