- From: Larry Masinter <masinter@parc.xerox.com>
- Date: Wed, 15 May 1996 09:19:10 PDT
- To: Harald.T.Alvestrand@uninett.no
- Cc: mduerst@ifi.unizh.ch, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
The HTML-I18N draft contains an extension to language tags, namely the
addition of languages from the "Ethnologue". The HTTP/1.1 draft refers
to RFC 1766 without any additions. In fact, HTTP probably needs the
same extensions that HTML needs!
Could we suggest a revised RFC 1766 that contains the addition of
Ethnologue 3-letter language tags (and possibly hierarchical matching)
and leave these things out of html-i18n and http-v11?
Note how HTTP-v11 and html-i18n have gotten slightly out of sync,
e.g., the http editing group felt it necessary to point out that
x-pig-latin was not actually registered.
================================================================
>From draft-ietf-html-i18n-03.txt:
================================================================
3. The LANG attribute
Language tags can be used to control rendering of a marked up docu-
ment in various ways: character disambiguation, in cases where the
character encoding is not sufficient to resolve to a specific glyph;
quotation marks; hyphenation; ligatures; spacing; voice synthesis;
etc. Independently of rendering issues, language markup is useful as
content markup for purposes such as classification and searching.
Since any text can logically be assigned a language, almost all HTML
elements admit the LANG attribute. The DTD reflects this. It is
also intended that any new element introduced in later versions of
HTML will admit the LANG attribute, unless there is a good reason not
to do so.
The language attribute, LANG, takes as its value a language tag that
identifies a natural language spoken, written, or otherwise conveyed
by human beings for communication of information to other human
beings. Computer languages are explicitly excluded.
The syntax and registry of HTML language tags is the same as that
defined by RFC 1766 [RFC1766]. In summary, a language tag is composed
of one or more parts: A primary language tag and a possibly empty
series of subtags:
language-tag = primary-tag *( "-" subtag )
primary-tag = 1*8ALPHA
subtag = 1*8ALPHA
Whitespace is not allowed within the tag and all tags are case-
insensitive. The namespace of language tags is administered by the
IANA. Example tags include:
en, en-US, en-cockney, i-cherokee, x-pig-latin
Two-letter primary-tags are reserved for ISO 639 language abbrevia-
tions [ISO-639], and three-letter primary-tags for the language
abbreviations of the "Ethnologue" [ETHNO] (the latter is in addition
to the requirements of RFC 1766). Any two-letter initial subtag is an
ISO 3166 country code [ISO-3166].
In the context of HTML, a language tag is not to be interpreted as a
single token, as per RFC 1766, but as a hierarchy. For example, a
user agent that adjusts rendering according to language should con-
sider that it has a match when a language tag in a style sheet entry
matches the initial portion of the language tag of an element. An
exact match should be preferred. This interpretation allows an ele-
ment marked up as, for instance, "en-US" to trigger styles corre-
sponding to, in order of preference, US-English ("en-US") or 'plain'
or 'international' English ("en").
NOTE -- using the language tag as a hierarchy does not
imply that all languages with a common prefix will be
understood by those fluent in one or more of those lan-
guages; it simply allows the user to request this commonal-
ity when it is true for that user.
The rendering of elements may be affected by the LANG attribute. For
any element, the value of the LANG attribute overrides the value
specified by the LANG attribute of any enclosing element and the
value (if any) of the HTTP Content-Language header. If none of these
are set, a suitable default, perhaps controlled by user preferences,
by automatic context analysis or by the user's locale, should be used
to control rendering.
================================================================
from draft-ietf-http-v11-spec-03.txt
================================================================
7.10 Language Tags
A language tag identifies a natural language spoken, written, or
otherwise conveyed by human beings for communication of information to
other human beings. Computer languages are explicitly excluded. HTTP
uses language tags within the Accept-Language, and Content-Language
fields.
The syntax and registry of HTTP language tags is the same as that
defined by RFC 1766 . In summary, a language tag is composed of 1 or
more parts: A primary language tag and a possibly empty series of
subtags:
language-tag = primary-tag *( "-" subtag )
primary-tag = 1*8ALPHA
subtag = 1*8ALPHA
Whitespace is not allowed within the tag and all tags are case-
insensitive. The name space of language tags is administered by the
IANA. Example tags include:
en, en-US, en-cockney, i-cherokee, x-pig-latin
where any two-letter primary-tag is an ISO 639 language abbreviation and
any two-letter initial subtag is an ISO 3166 country code. (The last
three tags above are not registered tags; all but the last are examples
of tags which could be registered in future.)
Received on Wednesday, 15 May 1996 09:21:03 UTC