- From: John D. Burger <john@mitre.org>
- Date: Fri, 20 Feb 1998 12:48:57 -0500
- To: Mark Crispin <MRC@panda.com>
- Cc: ietf-languages@apps.ietf.org, ietf-charsets@INNOSOFT.COM
Mark Crispin wrote: > Actually, the scope of RFC 1766 doesn't cover the issue in question; RFC 1766 > covers language tagging of data as opposed to a user's preferred language > list. The above statement makes sense in the RFC 1766 context; it's not clear > that it does in other contexts. Hmm, I'm not sure I understand this distinction. > I also don't think that that section was intended to cause "FR" and "FR-FR" to > be treated as non-matching, although I understand that it can be read that > way. It definitely implies that "FR-FR" and "FR-CA" must not match. I think the second paragraph I lifted from Section 2.1 is very clear: Applications should always treat language tags as a single token; the division into main tag and subtags is an administrative mechanism, not a navigation aid. This says language tags are opaque and atomic, as far as application comparison is concerned, e.g., "FR" and "FR-FR" do =not= match. As you suggest, however, new rules could (probably should) be formulated that specify subtyping behavior, or fallbacks of some sort, just as MIME says that the type "text/whatsit" can be treated as "text/plain" by applications that don't understand "text/whatsit" > I don't have a language code list at hand, so I can't check, but I hope that > non-Mandarin Chinese dialects (such as are given their > own language codes and are not under ZH. It would be a problem otherwise. As far as I can tell, there are no separate language codes for any of the various Chinese dialects. In general, the kind of of type/subtype operations you described can only work when the "default" dialect can be assumed to be understood by any speaker of a non-default dialect (this has nothing to do with i-default, hopefully). This works for EN/EN-US/EN-UK and FR/FR-FR/FR-CA, but not necessarily in general. I think the Chinese dialects you mention are indeed going to be a problem, because they're not really simply =dialects=. Perhaps ISO 639 should be amended to include these as separate languages, and ZH should be labeled "Mandarin", not "Chinese". The real problem, of course, is that we're using the same set of codes for spoken =and= written languages, and the two do not always align. I also think some creoles might be problematic (I know ISO 639 has codes for various =groups= of creoles, but I'm not sure how useful they are). Given all of this, I'm starting to wonder whether specifying any subtyping behavior on language tags is appropriate - perhaps client applications should simply stick to listing the "parent" tags in addition to any subtypes, on a language-by-language basis, i.e., if the user has only listed "FR-CA", they don't get "FR" from the server, even if it has such an alternative. I guess I'm saying that I'm not sure the table in one of your previous messages defines appropriate behavior. An alternative is for IETF to reserve the subtag "default", e.g., FR-default, and specify that dialects can drop down to this. That way, the subtyping or fallback behavior can be controlled on a language-by-language basis. This won't force servers to include twice as many alternatives, of course: if the "main" language tag can drop down to the default, then the server need only maintain the latter (default) version of the data, e.g., FR-default can match FR as well as FR-FR and FR-CA. Is this a viable solution? - John Burger --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Tuesday, 24 February 1998 20:41:29 UTC