- From: Éric Bischoff <e.bischoff@noos.fr>
- Date: Wed, 24 Jul 2002 01:28:05 +0200
- To: Paul Grosso <pgrosso@arbortext.com>, www-xsl-fo@w3.org
On Tuesday 23 July 2002 23:05, Paul Grosso wrote: > The XSL FO subgroup has discussed an issue regarding the allowable > values of the language property [1] (see [2] for the comment). > > At least some WG members believe the XSL spec should require > the use of 3 character codes for the language property as these > are clear and unambiguous (at least if Terminology values > are used when there is a conflict between those and the > Bibliographic values as is required by RFC 3066 [2]). > > Others believe 2 character values (allowed by RFC 3066 and > allowed as values for the xml:lang shorthand [4]) should also > be allowed as values for the language property. > > Note that, in any case, both 2 and 3 character values are > allowed for values of the xml:lang shorthand--that is not > in question (since it is defined by the XML 1.0 spec [5]). > > We would like to survey implementations and users on this issue. We already had the very same argument at the KDE project. Maybe I can make you profit out what came out of it. There is some normative document somewhere that already addresses that problem. The KDE project decided to conform to it. If I remind correctly, it's a RFC, sorry I don't have the reference at hand. If I remind correctly, that document says that: - if you have the choice between ISO-639-1 and ISO-639-2 codes, you should take the 2 letters one; - if you don't have the choice, you should take the 3 letters one Example: "deu", "ger", "de" (German) => chose "de" (2 letters code) "ven" (Venda) => chose "ven" (3 letters code) I deeply regret that decision from a personal point of view. My point during the discussion was that such a normative document made many codes in ISO-639-2 useless, so I was feeling that that document, even normative, was idiotic. I have the feeling too that 3 letters codes were a positive and unambiguous evolution of the old 2 letters codes and that there was no point in mixing both standards. The main argument against that opinion during the discussion was that ISO 639-1 and -2 are about _codes_, not about their _usage_. The other document (sorry for the missing reference) tells how to _use_ them. To be noted that RFC 3066 (referenced by the XSL-FO spec) allows both choices, with no preference. So I have the feeling that implementations of the current XSL-FO specification should support both. It's late here in France, but tomorrow I can try to find again the missing reference of the RFC I'm referring to. > We would like to hear what the various XSL-FO implementations > accept for values of the language property, specifically, > whether 2 character language codes are accepted or rejected. The one I'm currently writing with Arved Sandstrom will accept both posssibilities. > If a given implementation accepts 2 character values (e.g., "EN"), > how are they interpreted (e.g., does "EN" mean US english, > British english, or something else)? I believe that this pecular point is covered by the RFC 3066 which is referenced from the XSL-FO specification : "en" = English "en-GB" = British English "en-US" = American English Same for 3 letters codes: "eng" "eng-GB" "eng-US" It's independant of the length of the code ;-). First mandatory part is language as defined in ISO-639-1 or -2, second optional part part is country code as defined in ISO-3166. -- Éric Bischoff
Received on Tuesday, 23 July 2002 19:27:36 UTC