Re: Question on implementation of the language property from Éric Bischoff on 2002-07-23 (www-xsl-fo@w3.org from July 2002)

From: Éric Bischoff <e.bischoff@noos.fr>
Date: Wed, 24 Jul 2002 01:28:05 +0200
To: Paul Grosso <pgrosso@arbortext.com>, www-xsl-fo@w3.org
Message-Id: <200207240128.05539.e.bischoff@noos.fr>
On Tuesday 23 July 2002 23:05, Paul Grosso wrote:
> The XSL FO subgroup has discussed an issue regarding the allowable
> values of the language property [1] (see [2] for the comment).
>
> At least some WG members believe the XSL spec should require
> the use of 3 character codes for the language property as these
> are clear and unambiguous (at least if Terminology values
> are used when there is a conflict between those and the
> Bibliographic values as is required by RFC 3066 [2]).
>
> Others believe 2 character values (allowed by RFC 3066 and
> allowed as values for the xml:lang shorthand [4]) should also
> be allowed as values for the language property.
>
> Note that, in any case, both 2 and 3 character values are
> allowed for values of the xml:lang shorthand--that is not
> in question (since it is defined by the XML 1.0 spec [5]).
>
> We would like to survey implementations and users on this issue.

We already had the very same argument at the KDE project. Maybe I can make you 
profit out what came out of it.

There is some normative document somewhere that already addresses that 
problem. The KDE project decided to conform to it. If I remind correctly, 
it's a RFC, sorry I don't have the reference at hand.

If I remind correctly, that document says that:
- if you have the choice between ISO-639-1 and ISO-639-2 codes, you should 
take the 2 letters one;
- if you don't have the choice, you should take the 3 letters one

Example:
	"deu", "ger", "de" (German) => chose "de" (2 letters code)
	"ven" (Venda) => chose "ven" (3 letters code)

I deeply regret that decision from a personal point of view. My point during 
the discussion was that such a normative document made many codes in 
ISO-639-2 useless, so I was feeling that that document, even normative, was 
idiotic. I have the feeling too that 3 letters codes were a positive and 
unambiguous evolution of the old 2 letters codes and that there was no point 
in mixing both standards. The main argument against that opinion during the 
discussion was that ISO 639-1 and -2 are about _codes_, not about their 
_usage_. The other document (sorry for the missing reference) tells how to 
_use_ them.

To be noted that RFC 3066 (referenced by the XSL-FO spec) allows both choices, 
with no preference. So I have the feeling that implementations of the current 
XSL-FO specification should support both.

It's late here in France, but tomorrow I can try to find again the missing 
reference of the RFC I'm referring to.

> We would like to hear what the various XSL-FO implementations
> accept for values of the language property, specifically,
> whether 2 character language codes are accepted or rejected.

The one I'm currently writing with Arved Sandstrom will accept both 
posssibilities.

> If a given implementation accepts 2 character values (e.g., "EN"),
> how are they interpreted (e.g., does "EN" mean US english,
> British english, or something else)?

I believe that this pecular point is covered by the RFC 3066 which is 
referenced from the XSL-FO specification : 
	"en" = English
	"en-GB" = British English
	"en-US" = American English

Same for 3 letters codes:
	"eng"
	"eng-GB"
	"eng-US"

It's independant of the length of the code ;-). First mandatory part is 
language as defined in ISO-639-1 or -2, second optional part part is country 
code as defined in ISO-3166.

-- 
Éric Bischoff
Received on Tuesday, 23 July 2002 19:27:36 UTC