- From: Leif Halvard Silli <lhs@malform.no>
- Date: Fri, 25 Apr 2008 19:26:33 +0200
- To: John Cowan <cowan@ccil.org>
- CC: www-international@w3.org
John Cowan 25-04-08 16:23: > Leif Halvard Silli scripsit: > > > * The reason why Google uses 'no' for 'nb', instad of 'nb' for 'nb', > > is not the one you mention. > > How do you know this? > > Employees of both Google and Yahoo made such comments on LTRU WG telcons. > Perhaps only a side note, really, but the e-mail message from me to Google the summer 2001, where I asked them for permission to initiate translating Google to Nynorsk, sits in my mailbox. I first had to explain to them about the existence of Nynorsk. Et cetera. (I guess I could have been more on them, from the start, about doing these things right.) I have not been on those telcons. But the "weight" argument can of course be used everywhere, in any context. "We choose to only really care for those that count for most people." There is no particular logic to it. The outcome can be good or bad, logical or illogical. Therefore it matters how the options look like. And they are confusing and do not enourage perfect solutions. Otherwise, I told you in message how I know this. Google began with 'nb' first. And that points to another problem with the no/nb/nn approach: If you start out with 'no' - either because you use it for a only Nynorsk or only Bokmål site, or because you mix both language forms (a newspaper offering articles in both language forms, for instance). Then, at a later moment you decide to offer parallell versions - in nn and nb. And what do you have to do then? Then you must change all the 'no' tags to either 'nn' or 'nb'. It will not be enough to just change either the Nynorsk texts from 'no' to 'no-nyn', and then to let 'no' be used for Bokmål. On must swich *both* to new tags. With the risk that pages and language negotiation breaks, for a short while at least, in the browsers. Or take English instead: You want to offer a GB versioin of a certain resource. Then you just add en-GB to the related pages. You do not need to change the other pages as well. Another example is Wikipedia, who uses the language tags as part of the URL to the language versions. Norwegian Wikipedia started out as a common Norwegian project, on the address no.wikipedia.org. Later one decided to separate into Nynorsk and Bokmål. Then Bokmål remained on said address. While Nynorsk moved to nn.wikipedia.org. With the approach I advocates, one could just change 'no' to 'no-NN' or 'no-nyn'. Still one *ought* - in my view - to change the nb version from 'no' to 'no-NB' or 'no-bok'. But at least it will feel more right if the two languages are separated as 'no' versus 'no-nyn' than as 'no' versus 'nn'. Also, this is in fact how US English is separated from British Englis in many situations: 'en' contains US English while 'en-GB' contains british English. The opposite also exists. (This approach would not quite solve anything for Wikipedia, though. Guess they would have to use 3-letter codes to come any way.) > > When Norwegians themselves, and messages I get from forreigners trying > > to understand the Norwegian codes, show that they are not understood, > > what shall we do then? Be held hostage of U.N. Statistics Division, who > > has developed those codes for entirely other purposes? > > The alternatives are: use the somewhat inappropriate ISO 3166 standard, > with its dependence on distinctions that are partly political and > partly economic; or develop our own, and immediately get caught up in > a never-ending debate. > > > Or perhaps Norwegian should be considerd a "Macro languge", > > That is so in ISO 639-3. > Not sure what you mean. How can I read into/out of ISO 639-3 that Norwegian is considered a macro languge for Nynorsk and Bokmål? Or did you mean to say that, given the macro language approch, we should choose the 3-letter codes from ISO 639-3 as extended-language subtags. Thus we would have: no-nno for Norwegian nynorsk no-nbo for NOrwegian bokmål > > and extended-language tags be taken into use to denote each variant? The > > 'no-bok' and 'no-nyn' fits perfectly in to that picture, don't they? > > In a very hard-fought decision, LTRU decided not to go forward with > extended language subtags, but to use ISO 639-3 code elements directly > as language subtags for all languages. > Ok, now I understand better what the article 'Language Tags in HTML and XML' says about this. * http://www.w3.org/International/articles/language-tags/#issues However, what about no-nor and en-eng, will such tautologies also be possible via that approach? > > BCP 47 now says that nb and nn are preferred and that they "replaced" > > no-nyn and no-bok. > > Preferred, yes. But once a tag is valid, it remains valid forever in > the same meaning: that's a basic rule of BCP 47. > > > But if it is as you say, then I would like to propose that 'no-nyn' and > > 'no-bok' was made the preferred codes. > > At the moment, ietf-languages doesn't have the authority to prefer > an older (and irregular) tag to a newer ISO equivalents when it > becomes available. If you want to change that, post to ltru@ietf.org; > this is about the last possible moment to do so. Note that you need > to propose actual text (you can find the current editorial draft at > http://inter-locale.com/ID/draft-ietf-ltru-4646bis-13.html ) and you > need to speak to the *general* issue of allowing ietf-languages to decide > whether an existing tag should be preferred to a new one, something they > currently have no discretion on. > Thanks for this info, which seems to be very accurate and realistic. I guess just have to start looking at it. -- Leif Halvard Silli
Received on Friday, 25 April 2008 17:27:22 UTC