- From: Larry Masinter <masinter@parc.xerox.com>
- Date: Wed, 15 May 1996 09:19:10 PDT
- To: Harald.T.Alvestrand@uninett.no
- Cc: mduerst@ifi.unizh.ch, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
The HTML-I18N draft contains an extension to language tags, namely the addition of languages from the "Ethnologue". The HTTP/1.1 draft refers to RFC 1766 without any additions. In fact, HTTP probably needs the same extensions that HTML needs! Could we suggest a revised RFC 1766 that contains the addition of Ethnologue 3-letter language tags (and possibly hierarchical matching) and leave these things out of html-i18n and http-v11? Note how HTTP-v11 and html-i18n have gotten slightly out of sync, e.g., the http editing group felt it necessary to point out that x-pig-latin was not actually registered. ================================================================ >From draft-ietf-html-i18n-03.txt: ================================================================ 3. The LANG attribute Language tags can be used to control rendering of a marked up docu- ment in various ways: character disambiguation, in cases where the character encoding is not sufficient to resolve to a specific glyph; quotation marks; hyphenation; ligatures; spacing; voice synthesis; etc. Independently of rendering issues, language markup is useful as content markup for purposes such as classification and searching. Since any text can logically be assigned a language, almost all HTML elements admit the LANG attribute. The DTD reflects this. It is also intended that any new element introduced in later versions of HTML will admit the LANG attribute, unless there is a good reason not to do so. The language attribute, LANG, takes as its value a language tag that identifies a natural language spoken, written, or otherwise conveyed by human beings for communication of information to other human beings. Computer languages are explicitly excluded. The syntax and registry of HTML language tags is the same as that defined by RFC 1766 [RFC1766]. In summary, a language tag is composed of one or more parts: A primary language tag and a possibly empty series of subtags: language-tag = primary-tag *( "-" subtag ) primary-tag = 1*8ALPHA subtag = 1*8ALPHA Whitespace is not allowed within the tag and all tags are case- insensitive. The namespace of language tags is administered by the IANA. Example tags include: en, en-US, en-cockney, i-cherokee, x-pig-latin Two-letter primary-tags are reserved for ISO 639 language abbrevia- tions [ISO-639], and three-letter primary-tags for the language abbreviations of the "Ethnologue" [ETHNO] (the latter is in addition to the requirements of RFC 1766). Any two-letter initial subtag is an ISO 3166 country code [ISO-3166]. In the context of HTML, a language tag is not to be interpreted as a single token, as per RFC 1766, but as a hierarchy. For example, a user agent that adjusts rendering according to language should con- sider that it has a match when a language tag in a style sheet entry matches the initial portion of the language tag of an element. An exact match should be preferred. This interpretation allows an ele- ment marked up as, for instance, "en-US" to trigger styles corre- sponding to, in order of preference, US-English ("en-US") or 'plain' or 'international' English ("en"). NOTE -- using the language tag as a hierarchy does not imply that all languages with a common prefix will be understood by those fluent in one or more of those lan- guages; it simply allows the user to request this commonal- ity when it is true for that user. The rendering of elements may be affected by the LANG attribute. For any element, the value of the LANG attribute overrides the value specified by the LANG attribute of any enclosing element and the value (if any) of the HTTP Content-Language header. If none of these are set, a suitable default, perhaps controlled by user preferences, by automatic context analysis or by the user's locale, should be used to control rendering. ================================================================ from draft-ietf-http-v11-spec-03.txt ================================================================ 7.10 Language Tags A language tag identifies a natural language spoken, written, or otherwise conveyed by human beings for communication of information to other human beings. Computer languages are explicitly excluded. HTTP uses language tags within the Accept-Language, and Content-Language fields. The syntax and registry of HTTP language tags is the same as that defined by RFC 1766 . In summary, a language tag is composed of 1 or more parts: A primary language tag and a possibly empty series of subtags: language-tag = primary-tag *( "-" subtag ) primary-tag = 1*8ALPHA subtag = 1*8ALPHA Whitespace is not allowed within the tag and all tags are case- insensitive. The name space of language tags is administered by the IANA. Example tags include: en, en-US, en-cockney, i-cherokee, x-pig-latin where any two-letter primary-tag is an ISO 639 language abbreviation and any two-letter initial subtag is an ISO 3166 country code. (The last three tags above are not registered tags; all but the last are examples of tags which could be registered in future.)
Received on Wednesday, 15 May 1996 09:21:03 UTC