RE: [Ltru] Proposed resolution for Issue 13 (language tags)

From: Phillips, Addison <addison@amazon.com> · Date: Sat, 18 Jul 2009 08:45:07 -0700

Hello Julian,

I'm glad to see this note. My thoughts follow:

> 1) The exact wording of the "summary",

I think your wording is generally good. There are a couple of minor points to make.

> HTTP uses language tags within the Accept-Language and Content-Language fields.

Not quite. It uses language tags in Content-Language. Accept-Language uses language *ranges*, which are currently defined by RFC 4647, although I would tend not to change this text here to note that fact. Section 5.4 can cover that.

> 2) whether we're referring the right ABNF production (does it need
> to be "obs-language-tag" instead, or both), and

I don't think it should be both. That would be confusing and lead to interoperability issues. 

I think that, ideally, you would use the new production rather than obs-language-tag. While obs-language-tag is more permissive, language tags that match it (but not language-tag) have never been valid. And many of the most common invalid values happen to match both productions.

> 3) the examples

There is nothing wrong with the set of examples you have, although "x-pig-latin" is suspect :-) and the list is somewhat eclectic. The list in RFC 2616 was:

  en, en-US, en-cockney, i-cherokee, x-pig-latin

This list had the advantage of being somewhat obvious to English speakers without additional annotation. (Note that the Cherokee tag was never actually valid!!) Perhaps some values from 4646bis Appendix B would be suitable. I suggest a carefully constructed list, such as:

====
   Example tags include:

   en (English)
   en-US (English, United States)
   en-US-x-pig-latin (English, United States, private use subtags)
   hy-Latn-IT-arevela (Armenian, Latin script, Italy, eastern variant)
   es-419 (Spanish, Latin America)
====

One additional issue related to language tags (you don't appear to be tracking it separately). I don't think that the current incarnation of section 5.4 (Accept-Language) is quite right. In particular, the Basic Filtering algorithm is made normative as the language negotiation strategy. I tend to find the Lookup algorithm a better/more common choice, personally, and others might make different choices, depending on the application. I think this text should be made more dependent on the text in 4647, rather than trying to recreate it in a shorter form. If HTTP-WG feels that an algorithm must be made normative, then my personal opinion is that it should be Lookup, not Filtering.

Addison

PS> I have blind-copied the public-i18n-core@w3.org list on this message because that group has an interest in this topic but I don't want to expand the cross-posting of the thread.

Addison Phillips
Globalization Architect -- Lab126
Chair -- W3C Internationalization WG

Internationalization is not a feature.
It is an architecture.

> -----Original Message-----
> From: ltru-bounces@ietf.org [mailto:ltru-bounces@ietf.org] On
> Behalf Of Julian Reschke
> Sent: Saturday, July 18, 2009 4:58 AM
> To: HTTP Working Group; LTRU Working Group
> Subject: Re: [Ltru] Proposed resolution for Issue 13 (language tags)
> 
> Julian Reschke wrote:
> >
> > OK,
> >
> > thanks for all the feedback so far. I (hopefully) have addressed
> many of
> > the issues; here's the new proposed text for 3.5:
> > ...
> 
> We stopped to discuss this
> (<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/13>) 15 months
> ago;
> in the meantime RFC4646bis went through many revisions, and now is
> approved and in the RFC Editor queue.
> 
> I have updated the proposed change for HTTPbis Part 3 accordingly;
> see
> <http://trac.tools.ietf.org/wg/httpbis/trac/attachment/ticket/13/i1

> 3.4.diff>.
> 
> The full text now would read:
> 
> --
> 2.4.  Language Tags
> 
>     A language tag, as defined in [RFC4646bis], identifies a
> natural
>     language spoken, written, or otherwise conveyed by human beings
> for
>     communication of information to other human beings.  Computer
>     languages are explicitly excluded.  HTTP uses language tags
> within
>     the Accept-Language and Content-Language fields.
> 
>     In summary, a language tag is composed of one or more parts: A
>     primary language subtag followed by a possibly empty series of
>     subtags:
> 
>       language-tag = <Language-Tag, defined in [RFC4646bis],
> Section 2.1>
> 
>     White space is not allowed within the tag and all tags are
> case-
>     insensitive.  The name space of language subtags is
> administered by
>     the IANA (see
>     <http://www.iana.org/assignments/language-subtag-registry>).
> 
>     Example tags include:
> 
>       en, en-US, es-419, az-Arab, x-pig-latin, man-Nkoo-GN
> 
>     See [RFC4646bis] for further information.
> --
> 
> I understand that back in April 2008 we still discussed various
> details,
> such as
> 
> 1) The exact wording of the "summary",
> 2) whether we're referring the right ABNF production (does it need
> to be
>   "obs-language-tag" instead, or both), and
> 3) the examples
> 
> (the discussion is archived around
> <http://lists.w3.org/Archives/Public/ietf-http-

> wg/2008AprJun/0210.html>)
> 
> I'd really like to close this one finally, so feedback from the
> language
> tag experts would be appreciated.
> 
> BR, Julian
> _______________________________________________
> Ltru mailing list
> Ltru@ietf.org
> https://www.ietf.org/mailman/listinfo/ltru