Clarification on language tag usage in latest draft: take 2

[ The list server made a mess of a previous try at sending this ]
[ message (it does not seem to like MIME headers, or too much). ]
[ Hopefully this will be clearer. ]

I just read the new RFC 1766 on language tags and found something in
it that seems to clash with the intended usage in HTTP.  Section 2.1
of RFC 1766 contains the following:

   Applications should always treat language tags as a single token; the
   division into main tag and subtags is an administrative mechanism,
   not a navigation aid.

I take this to mean that the tags are not hierarchical, i.e. that "en"
is not to be understood as a superset of "en-US".  This may be fine for
Content-Language, but will not work, IMHO, for Accept-Language.

My interpretation is that asking for "en-US" will NOT get you an "en"
document if "en-US" is not available.  That may be OK.

Likewise, asking for "en" will NOT get you "en-US", "en-UK" or any
other "en-SOMETHING".  This seems to me to be unnacceptable.  It means
that the naive user would have to be aware of all variants of "en-*"
in existence in order to construct the simple request "Send anything
in English".  Same for any other language that has variants.

Here is a suggestion to fix this.  Simply add the following to section
8.2 of the HTTP draft, just before the Note:

  In the context of the Accept-Language header (section 5.4.4) a
  language tag is not to be interpreted as a single token, as per RFC
  1766, but as a hierarchy.  A server should consider that it has a
  match when a language tag received in an Accept-Language header
  matches the initial portion of the language tag of a document.  An
  exact match should be preferred.  This interpretation allows a
  browser to send, for example:

    Accept-Language: en-US, en

  when the intent is to access, in order of preference, documents in
  American-English ("en-US"), 'plain' or 'international' English
  ("en"), and any other variant of English (initial "en-").

I think the above is preferable to changing RFC 1766 and hacking up a
scheme such as "en-*", and is still flexible enough.  Any comments?

-- 
Francois Yergeau <yergeau@alis.ca>
Alis Technologies Inc., Montreal
+1 (514) 738-9171
+1 (514) 342-0318



-- 
François Yergeau <yergeau@alis.ca>       | Qui se fait brebis le loup
                                         | le mange.

Received on Thursday, 16 March 1995 12:30:19 UTC