Re: prefer-language tag

On Fri, 20 Feb 1998 12:48:57 -0500, John D. Burger wrote:
> > the scope of RFC 1766 doesn't cover the issue in question; RFC 1766
> > covers language tagging of data as opposed to a user's preferred language
> > list.  The above statement makes sense in the RFC 1766 context; it's not
> > clear that it does in other contexts.
> Hmm, I'm not sure I understand this distinction.

In RFC 1766, we are labelling data.  That is, we are saying "this data is in
the following language(s)".

What we want with a "preferred language" is to say "a human's language
preference is the following language(s)".

These are two very different concepts.

> > I also don't think that that section was intended to cause "FR" and
> > "FR-FR" to be treated as non-matching, although I understand that it can
> > be read that way.  It definitely implies that "FR-FR" and "FR-CA" must not
> > match.
> I think the second paragraph I lifted from Section 2.1 is very clear:
>   Applications should always treat language tags as a single token; the
>   division into main tag and subtags is an administrative mechanism,
>   not a navigation aid.

Once again, you have to consider the two cases of
	This data is in the following language(s).
	A human's language preference is the following language(s).
and recognize that RFC 1766 is talking about the first case.  The second case
isn't defined at all (that's what we're trying to do now!).

We are using the same syntax and registration for "preferred language" as is
used for RFC 1766 "language tags".  But, the semantics of "preferred language"
are not the same.

In "language tags", "FR, DE" means "this content is in French *and* German."
In "preferred language", "FR, DE" means "I prefer to get my text in French,
failing that, in German" with an implicit fallthrough to i-default.

These are new semantics, and thus need a new definition.

> As far as I can tell, there are no separate language codes for any of the
> various Chinese dialects.

We should verify this.

Chinese dialects are not mutually intelligible; Mandarin and Cantonese are
more different than, say, Swedish and Norwegian or Spanish and Italian.

> Perhaps ISO 639  should be amended to include these as separate languages,
> and ZH should be labeled "Mandarin", not "Chinese".

We should assume that ISO specifications are out of our control and immutable;
but we should also make sure there's a problem to begin with.  I have only
ever seen ZH-TW and ZH-CN in use, although ZH-HK and ZH-SG appears in some
software.

ZH-TW and ZH-CN are definitely Mandarin.

I know that Cantonese exists as a written language, but I'm not sure how much
of a difference it makes.  In particular, I *think* that written Mandarin is
intelligible to a Cantonese speaker, but I'm not sure if that's a natural
occurance or if it's due to education (kids in China and Taiwan are forced to
learn Mandarin in school).  I don't know if written Cantonese is intelligible
to a Mandarin-only speaker.

Hopefully, a native Chinese can comment on this topic more intelligently.

> Given all of this, I'm starting to wonder whether specifying any subtyping
> behavior on language tags is appropriate - perhaps client applications
> should simply stick to listing the "parent" tags in addition to any
> subtypes, on a language-by-language basis, i.e., if the user has only listed
> "FR-CA", they don't get "FR" from the server, even if it has such an
> alternative.

My concern has to do with interoperability.  How should a French Canadian user
configure his client so that it works with an arbitrary server that speaks
French?

How should a server implementor offer French?  What does he do if he only
offers one form of French?  What does he do if he offers both French Canadian
and French French?  [I don't know if this is a major issue in French, but it
definitely is for Spanish.]

> An alternative is for IETF to reserve the subtag "default", e.g.,
> FR-default, and specify that dialects can drop down to this.

I'm not sure that I understand what "FR-default" buys us over "FR".

Can you give an example of how "FR-default" and "FR" would work differently?


--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Tuesday, 24 February 1998 22:57:26 UTC