Re: prefer-language tag from John D. Burger on 1998-02-20 (ietf-charsets@w3.org from January to March 1998)

From: John D. Burger <john@mitre.org>
Date: Fri, 20 Feb 1998 12:48:57 -0500
To: Mark Crispin <MRC@panda.com>
Cc: ietf-languages@apps.ietf.org, ietf-charsets@INNOSOFT.COM
Message-id: <34EDC209.DA4ACFFC@mitre.org>
Mark Crispin wrote:

> Actually, the scope of RFC 1766 doesn't cover the issue in question; RFC 1766
> covers language tagging of data as opposed to a user's preferred language
> list.  The above statement makes sense in the RFC 1766 context; it's not clear
> that it does in other contexts.

Hmm, I'm not sure I understand this distinction.

> I also don't think that that section was intended to cause "FR" and "FR-FR" to
> be treated as non-matching, although I understand that it can be read that
> way.  It definitely implies that "FR-FR" and "FR-CA" must not match.

I think the second paragraph I lifted from Section 2.1 is very clear:

  Applications should always treat language tags as a single token; the
  division into main tag and subtags is an administrative mechanism,
  not a navigation aid.

This says language tags are opaque and atomic, as far as application
comparison is concerned, e.g., "FR" and "FR-FR" do =not= match.  As you
suggest, however, new rules could (probably should) be formulated that specify
subtyping behavior, or fallbacks of some sort, just as MIME says that the type
"text/whatsit" can be treated as "text/plain" by applications that don't
understand "text/whatsit"

> I don't have a language code list at hand, so I can't check, but I hope that
> non-Mandarin Chinese dialects (such as  are given their
> own language codes and are not under ZH.  It would be a problem otherwise.

As far as I can tell, there are no separate language codes for any of the
various Chinese dialects.  In general, the kind of of type/subtype operations
you described can only work when the "default" dialect can be assumed to be
understood by any speaker of a non-default dialect (this has nothing to do
with i-default, hopefully).  This works for EN/EN-US/EN-UK and FR/FR-FR/FR-CA,
but not necessarily in general.  I think the Chinese dialects you mention are
indeed going to be a problem, because they're not really simply =dialects=. 
Perhaps ISO 639  should be amended to include these as separate languages, and
ZH should be labeled "Mandarin", not "Chinese".  The real problem, of course,
is that we're using the same set of codes for spoken =and= written languages,
and the two do not always align.  I also think some creoles might be
problematic (I know ISO 639 has codes for various =groups= of creoles, but I'm
not sure how useful they are).

Given all of this, I'm starting to wonder whether specifying any subtyping
behavior on language tags is appropriate - perhaps client applications should
simply stick to listing the "parent" tags in addition to any subtypes, on a
language-by-language basis, i.e., if the user has only listed "FR-CA", they
don't get "FR" from the server, even if it has such an alternative.  I guess
I'm saying that I'm not sure the table in one of your previous messages
defines appropriate behavior.

An alternative is for IETF to reserve the subtag "default", e.g., FR-default,
and specify that dialects can drop down to this.  That way, the subtyping or
fallback behavior can be controlled on a language-by-language basis.  This
won't force servers to include twice as many alternatives, of course: if the
"main" language tag can drop down to the default, then the server need only
maintain the latter (default) version of the data, e.g., FR-default can match
FR as well as FR-FR and FR-CA.

Is this a viable solution?

- John Burger

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Tuesday, 24 February 1998 20:41:29 UTC