Re: Question on implementation of the language property from Éric Bischoff on 2002-07-24 (www-xsl-fo@w3.org from July 2002)

From: Éric Bischoff <e.bischoff@noos.fr>
Date: Wed, 24 Jul 2002 20:38:12 +0200
To: Paul Grosso <pgrosso@arbortext.com>, www-xsl-fo@w3.org
Message-Id: <200207242038.12602.e.bischoff@noos.fr>

On Wednesday 24 July 2002 15:44, Paul Grosso wrote:
> >So the reasoning is unambiguous :
> >- The specification of XSL-FO relies on RFC 3066
> >- RFC 3066 gives the rules for chosing between 2 letters codes and 3
> > letters codes (if you have the choice, use 2 letters code)
> >- So documents conforming to XSL-FO should respect that rule
>
> I'm not sure which of the following you are suggesting:
>
> 1.  that current interpretation of XSL (since it references 3066) requires
>     use of 2 letter codes when available, or

Yes, that's what I was suggesting.

> >As I've been pointing in my previous message, this rule is idiotic because
> > it makes a "de facto" mixture of two code sets instead of keeping them
> > separate. But as I've pointed out too, huge projects (in size) like the
> > KDE project have chosen to respect that rule.
> >
> >Personally, I would however allow some tolerance and accept codes like
> > "deu" and "ger", even if "de" exists.
>
> Okay, so I gather you favor option 2 above, correct?

No. I was meaning that RFC 3066 gives a rule on what users should do. I was 
also meaning that this rule answered your question on which codes should be 
used in XSL-FO. However, I think that a good formatter should allow more 
freedom than that, and be tolerant if the user does not strictly respect the 
(not-so-good) rules edicted by RFC 3066.

> > I would even allow very common constructs not
> >allowed by RFC 3066 :
> >
> >        "fr_FR" instead of "fr-FR"
> >        "de-DE@euro" ('@' sign is normally illegal)
>
> It is highly unlikely that the XSL spec would sanction use of
> illegal codes.

Yes, of course. I tend to think that the amount of tolerance towards non 
standard constructs should be left to the choice of the implementors of 
formatters.

> >> > If a given implementation accepts 2 character values (e.g., "EN"),
> >> > how are they interpreted (e.g., does "EN" mean US english,
> >> > British english, or something else)?
> >>
> >> I believe that this pecular point is covered by the RFC 3066 which is
> >> referenced from the XSL-FO specification :
> >>       "en" = English
> >>       "en-GB" = British English
> >>       "en-US" = American English
> >>
> >> Same for 3 letters codes:
> >>       "eng"
> >>       "eng-GB"
> >>       "eng-US"
> >>
> >> It's independant of the length of the code ;-). First mandatory part is
> >> language as defined in ISO-639-1 or -2, second optional part part is
> >> country code as defined in ISO-3166.
> >
> > Also, if you ask me "Does 'en' resolve to 'en-GB' or 'en-US'?" I would
> > answer: it looks like an implementation choice, or could be parametrized.
> > After all, when we speak about "English", do we refer to "British
> > English" or to "American English"? There seems to be no easy answer to
> > that question. One could even imagine hyphenation dictionaries permitting
> > local variants like:
> >
> >        honor
> >        honour
>
> Thanks for your input.
>
> paul

Feel welcome, Paul, and sorry for the misunderstandings that arise from the 
fact that English is not my mother language.

-- 
Éric Bischoff

Received on Wednesday, 24 July 2002 14:37:36 UTC