Re: HTTP/1.1 comments from Larry Masinter on 1996-05-15 (ietf-http-wg@w3.org from April to June 1996)

From: Larry Masinter <masinter@parc.xerox.com>
Date: Wed, 15 May 1996 09:19:10 PDT
To: Harald.T.Alvestrand@uninett.no
Cc: mduerst@ifi.unizh.ch, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <96May15.091915pdt.2733@golden.parc.xerox.com>
The HTML-I18N draft contains an extension to language tags, namely the
addition of languages from the "Ethnologue". The HTTP/1.1 draft refers
to RFC 1766 without any additions. In fact, HTTP probably needs the
same extensions that HTML needs!

Could we suggest a revised RFC 1766 that contains the addition of
Ethnologue 3-letter language tags (and possibly hierarchical matching)
and leave these things out of html-i18n and http-v11?

Note how HTTP-v11 and html-i18n have gotten slightly out of sync,
e.g., the http editing group felt it necessary to point out that 
x-pig-latin was not actually registered.

================================================================
>From draft-ietf-html-i18n-03.txt:
================================================================
3. The LANG attribute

   Language tags can be used to control rendering of a marked up docu-
   ment in various ways: character disambiguation, in cases where the
   character encoding is not sufficient to resolve to a specific glyph;
   quotation marks; hyphenation; ligatures; spacing; voice synthesis;
   etc.  Independently of rendering issues, language markup is useful as
   content markup for purposes such as classification and searching.

   Since any text can logically be assigned a language, almost all HTML
   elements admit the LANG attribute.  The DTD reflects this.  It is
   also intended that any new element introduced in later versions of
   HTML will admit the LANG attribute, unless there is a good reason not
   to do so.

   The language attribute, LANG, takes as its value a language tag that
   identifies a natural language spoken, written, or otherwise conveyed
   by human beings for communication of information to other human
   beings. Computer languages are explicitly excluded.

   The syntax and registry of HTML language tags is the same as that
   defined by RFC 1766 [RFC1766]. In summary, a language tag is composed
   of one or more parts: A primary language tag and a possibly empty
   series of subtags:

        language-tag  = primary-tag *( "-" subtag )
        primary-tag   = 1*8ALPHA
        subtag        = 1*8ALPHA

   Whitespace is not allowed within the tag and all tags are case-
   insensitive. The namespace of language tags is administered by the
   IANA. Example tags include:

       en, en-US, en-cockney, i-cherokee, x-pig-latin

   Two-letter primary-tags are reserved for ISO 639 language abbrevia-
   tions [ISO-639], and three-letter primary-tags for the language
   abbreviations of the "Ethnologue" [ETHNO] (the latter is in addition
   to the requirements of RFC 1766). Any two-letter initial subtag is an
   ISO 3166 country code [ISO-3166].

   In the context of HTML, a language tag is not to be interpreted as a
   single token, as per RFC 1766, but as a hierarchy. For example, a
   user agent that adjusts rendering according to language should con-
   sider that it has a match when a language tag in a style sheet entry
   matches the initial portion of the language tag of an element. An
   exact match should be preferred. This interpretation allows an ele-
   ment marked up as, for instance, "en-US" to trigger styles corre-
   sponding to, in order of preference, US-English ("en-US") or 'plain'
   or 'international' English ("en").

        NOTE -- using the language tag as a hierarchy does not
        imply that all languages with a common prefix will be
        understood by those fluent in one or more of those lan-
        guages; it simply allows the user to request this commonal-
        ity when it is true for that user.

   The rendering of elements may be affected by the LANG attribute.  For
   any element, the value of the LANG attribute overrides the value
   specified by the LANG attribute of any enclosing element and the
   value (if any) of the HTTP Content-Language header. If none of these
   are set, a suitable default, perhaps controlled by user preferences,
   by automatic context analysis or by the user's locale, should be used
   to control rendering.

================================================================
from draft-ietf-http-v11-spec-03.txt
================================================================
7.10 Language Tags
A language tag identifies a natural language spoken, written, or
otherwise conveyed by human beings for communication of	information to
other human beings. Computer languages are explicitly excluded.	HTTP
uses language tags within the Accept-Language, and Content-Language
fields.

The syntax and registry	of HTTP	language tags is the same as that
defined	by RFC 1766 . In summary, a language tag is composed of	1 or
more parts: A primary language tag and a possibly empty	series of
subtags:

	language-tag  =	primary-tag *( "-" subtag )

	primary-tag   =	1*8ALPHA
	subtag	      =	1*8ALPHA

Whitespace is not allowed within the tag and all tags are case-
insensitive. The name space of language	tags is	administered by	the
IANA. Example tags include:

       en, en-US, en-cockney, i-cherokee, x-pig-latin

where any two-letter primary-tag is an ISO 639 language	abbreviation and
any two-letter initial subtag is an ISO	3166 country code.  (The last
three tags above are not registered tags; all but the last are examples
of tags	which could be registered in future.)
Received on Wednesday, 15 May 1996 09:21:03 UTC