Re: prefer-language tag from Mark Crispin on 1998-02-19 (ietf-charsets@w3.org from January to March 1998)

From: Mark Crispin <MRC@Panda.COM>
Date: Thu, 19 Feb 1998 13:22:16 -0800 (PST)
To: "John D. Burger" <john@mitre.org>
Cc: Marc Blanchet <Marc.Blanchet@viagenie.qc.ca>, ietf-languages@apps.ietf.org, ietf-charsets@INNOSOFT.COM
Message-id: <MailManager.887923336.28587.mrc@Ikkoku-Kan.Panda.COM>

On Thu, 19 Feb 1998 13:57:56 -0500, John D. Burger wrote:
> As I understand RFC 1766, this is expressively not allowed (Section 2.1):
>
>    There is no guaranteed relationship between languages whose tags
>    start out with the same series of subtags; especially, they are NOT
>    guraranteed [sic] to be mutually comprehensible, although this will
>    sometimes be the case.
>
>    Applications should always treat language tags as a single token; the
>    division into main tag and subtags is an administrative mechanism,
>    not a navigation aid.

Actually, the scope of RFC 1766 doesn't cover the issue in question; RFC 1766
covers language tagging of data as opposed to a user's preferred language
list.  The above statement makes sense in the RFC 1766 context; it's not clear
that it does in other contexts.

I also don't think that that section was intended to cause "FR" and "FR-FR" to
be treated as non-matching, although I understand that it can be read that
way.  It definitely implies that "FR-FR" and "FR-CA" must not match.

So the entire issue has to be reconsidered in this context, and a new set of
rules needs to be issued.  [Of course, RFC 1766 rules will have a heavy weight
on the new rules -- we want both sets to fit in together well.]

In any case, implementors should not be obliged to grope for the answer.  We
need specific rules, set down in writing.  Implementors have a no-win
situation otherwise.

> In particular, I suppose there are examples involving Chinese (e.g., ZH,
> ZH-TW, and ZH-CN) where the proposed behaviour is problematic.

I don't think so.  Language does not imply character set.  The ZH-TW and ZH-CN
dichotomy is mostly based upon traditional/simplified.  However, educated
Chinese in both the ROC and PRC can read texts in either form, and modern
Chinese language education ends up teaching both forms (so much for
simplification making education easier!).

I don't have a language code list at hand, so I can't check, but I hope that
non-Mandarin Chinese dialects (such as Cantonese and Fujian) are given their
own language codes and are not under ZH.  It would be a problem otherwise.

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Wednesday, 25 February 1998 00:38:59 UTC