Re: Clarification on language tag usage in latest draft

> I just read the new RFC 1766 on language tags and found something in
> it that seems to clash with the intended usage in HTTP.  Section 2.1
> of RFC 1766 contains the following:
> 
>    Applications should always treat language tags as a single token; the
>    division into main tag and subtags is an administrative mechanism,
>    not a navigation aid.
> 
> I take that to mean that the tags are not hierarchical, i.e. that "en"
> is not to be understood as a superset of "en-US".  This may be fine for
> Content-Language, but will not work, IMHO, for Accept-Language.
> 
> My interpretation is that asking for "en-US" will NOT get you an "en"
> document if "en-US" is not available.  That may be OK.
> 
> Likewise, asking for "en" will NOT get you "en-US", "en-UK" or any
> other "en-SOMETHING".  This seems to me to be unnacceptable.  It means
> that the naive user would have to be aware of all variants of "en-*"
> in existence in order to construct the simple request "Send anything
> in English".  Same for any other language that has variants.

Yes, this also struck me as a bit odd.  Fortunately, I had a note from
the author that essentially said "do the right thing -- treat it as
hierarchical".  I have no idea what was behind that section of RFC 1766.
I just ignored it for the HTTP spec, but I guess an explicit statement
would be preferable.

> Here is a suggestion to fix this.  Simply add the following to section
> 8.2 of the HTTP draft, just before the Note:
> 
>   In the context of the Accept-Language header (section 5.4.4) a
>   language tag is not to be interpreted as a single token, as per RFC
>   1766, but as a hierarchy.  A server should consider that it has a
>   match when a language tag received in an Accept-Language header
>   matches the initial portion of the language tag of a document.  An
>   exact match should be preferred.  This interpretation allows a
>   browser to send, for example:
> 
>     Accept-Language: en-US, en
> 
>   when the intent is to access, in order of preference, documents in
>   American-English ("en-US"), 'plain' or 'international' English
>   ("en"), and any other variant of English (initial "en-").
> 
> I think the above is preferable to changing RFC 1766 and hacking up a
> scheme such as "en-*", and is still flexible enough.  Any comments?

Yes, that will do quite nicely -- I'll add it to the next revision
if there are no strong objections.


......Roy Fielding   ICS Grad Student, University of California, Irvine  USA
                                     <fielding@ics.uci.edu>
                     <URL:http://www.ics.uci.edu/dir/grad/Software/fielding>

Received on Thursday, 16 March 1995 13:08:12 UTC