Re: Question on implementation of the language property

At 02:02 2002 07 24 +0200, Éric Bischoff wrote:

>Okay, I've found the number of the RFC that says which code to use. It happens 
>to be the very same RFC 3066 that XSL-FO specification references!

And that my initial message referenced.

>_______________________________________________________
>   2. When a language has both an ISO 639-1 2-character code and an ISO
>      639-2 3-character code, you MUST use the tag derived from the ISO
>      639-1 2-character code.
>_______________________________________________________
>
>I was wrong when I've said in my previous message that RFC 3066 gives no 
>preference for one encoding or the other. Sorry for that.
>
>So the reasoning is unambiguous :
>- The specification of XSL-FO relies on RFC 3066
>- RFC 3066 gives the rules for chosing between 2 letters codes and 3 letters 
>codes (if you have the choice, use 2 letters code)
>- So documents conforming to XSL-FO should respect that rule

I'm not sure which of the following you are suggesting:

1.  that current interpretation of XSL (since it references 3066) requires
    use of 2 letter codes when available, or

2.  you believe that the correct thing for XSL to do is to require use of
    2 letter codes when available.

I am pretty sure you are against:

3.  XSL should only allow use of 3 letter codes (i.e., prohibit use of
    2 letter codes as values for the language property)

but correct me if I am wrong.

>As I've been pointing in my previous message, this rule is idiotic because it 
>makes a "de facto" mixture of two code sets instead of keeping them separate. 
>But as I've pointed out too, huge projects (in size) like the KDE project 
>have chosen to respect that rule.
>
>Personally, I would however allow some tolerance and accept codes like "deu" 
>and "ger", even if "de" exists.

Okay, so I gather you favor option 2 above, correct?

> I would even allow very common constructs not 
>allowed by RFC 3066 :
>
>        "fr_FR" instead of "fr-FR"
>        "de-DE@euro" ('@' sign is normally illegal)

It is highly unlikely that the XSL spec would sanction use of
illegal codes.

>> > If a given implementation accepts 2 character values (e.g., "EN"),
>> > how are they interpreted (e.g., does "EN" mean US english,
>> > British english, or something else)?
>>
>> I believe that this pecular point is covered by the RFC 3066 which is
>> referenced from the XSL-FO specification :
>>       "en" = English
>>       "en-GB" = British English
>>       "en-US" = American English
>>
>> Same for 3 letters codes:
>>       "eng"
>>       "eng-GB"
>>       "eng-US"
>>
>> It's independant of the length of the code ;-). First mandatory part is
>> language as defined in ISO-639-1 or -2, second optional part part is
>> country code as defined in ISO-3166.
>
>Also, if you ask me "Does 'en' resolve to 'en-GB' or 'en-US'?" I would answer: 
>it looks like an implementation choice, or could be parametrized. After all, 
>when we speak about "English", do we refer to "British English" or to 
>"American English"? There seems to be no easy answer to that question. One 
>could even imagine hyphenation dictionaries permitting local variants like:
>
>        honor
>        honour
>
>-- 
>Éric Bischoff


Thanks for your input.

paul

Received on Wednesday, 24 July 2002 09:52:20 UTC