- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 18 Oct 2005 19:27:08 +0900
- To: w3c-xml-cg@w3.org, public-xml-core-wg@w3.org, "w3c-xml-schema-wg@w3.org" <w3c-xml-schema-wg@w3.org>
- Cc: "member-i18n-core@w3.org" <member-i18n-core@w3.org>
Hello XML Core, XML Schema and XML CG Working Groups, This mail is just to inform you that the IESG approved RFC3066bis, the revision of RFC 3066 "Tags for the Identification of Languages". The revision was undertaken mainly by Addison Philipps, chair of the i18n core working group, and Mark Davis (IBM). The document is not yet in its final location, but you can find a copy at [1] http://www.ietf.org/mail-archive/web/ltru/current/msg03949.html Below there is a summary of the changes from RFC 3066, taken from RFC 3066bis. I will keep you informed of further developments. Best regards, Felix Sasaki (team contact of i18n core) The main goals for this revision of language tags were the following: *Compatibility.* All RFC 3066 language tags (including those in the IANA registry) remain valid in this specification. The changes in this document represent additional constraints on language tags. That is, in no case is the syntax more permissive and processors based on the ABNF and other provisions of RFC 3066 (such as those described in [XMLSchema]) will be able to process the tags described by this document. In addition, this document defines language tags in such as way as to ensure future compatibility. *Stability.* Because of changes in the past in the underlying ISO standards, a valid RFC 3066 language tag could become invalid or have its meaning change. This has the potential of invalidating content that may have an extensive shelf-life. In this specification, once a language tag is valid, it remains valid forever. *Validity.* The structure of language tags defined by this document makes it possible to determine if a particular tag is well-formed without regard for the actual content or "meaning" of the tag as a whole. This is important because the registry grows and underlying standards change over time. In addition, it must be possible to determine if a tag is valid (or not) for a given point in time in order to provide reproducible, testable results. This process must not be error-prone; otherwise implementations might give different results. By having an authoritative registry with specific versioning information, the validity of language tags at any point in time can be precisely determined (instead of interpolating values from many separate sources). *Utility.* It is sometimes important to be able to differentiate between written forms of a language -- for many implementations this is more important than distinguishing between the spoken variants of a language. Languages are written in a wide variety of different scripts, so this document provides for the generative use of ISO 15924 script codes. Like the generative use of ISO language and country codes in RFC 3066, this allows combinations to be produced without resorting to the registration process. The addition of UN M.49 codes provides for the generation of language tags with regional scope, which is also required by some applications. The recast of the registry from containing whole language tags to subtags is a key part of this. An important feature of RFC 3066 was that it allowed generative use of subtags. This allows people to meaningfully use generated tags, without the delays in registering whole tags or the need to register all of the combinations that might be useful. The choice of placing the extended language and script subtags between the primary language and region subtags was widely debated. This design was chosen because the prevalent matching and content negotiation schemes rely on the subtags being arranged in order of increasing specificity. That is, the subtags that mark a greater barrier to mutual intelligibility appear left-most in a tag. For example, when selecting content written in Azerbaijani, the script (Arabic, Cyrillic, or Latin) represents a greater barrier to understanding than any regional variations (those associated with Azerbaijan or Iran, for example). Individuals who prefer documents in a particular script, but can deal with the minor regional differences, can therefore select appropriate content. Applications that do not deal with written content will continue to omit these subtags. *Extensibility.* Because of the widespread use of language tags, it is disruptive to have periodic revisions of the core specification, even in the face of demonstrated need. The extension mechanism provides for a way for independent RFCs to define extensions to language tags. These extensions have a very constrained, well- defined structure that prevent extensions from interfering with implementations of language tags defined in this document. The document also anticipates features of ISO 639-3 with the addition of the extended language subtags, as well as the possibility of other ISO 639 parts becoming useful for the formation of language tags in the future. The use and definition of private use tags has also been modified, to allow people to use private use subtags to extend or modify defined tags and to move as much information as possible out of private use and into the regular structure. The goal for each of these modifications is to reduce or eliminate the need for future revisions of this document. The specific changes in this document to meet these goals are: o Defines the ABNF and rules for subtags so that the category of all subtags can be determined without reference to the registry. o Adds the concept of well-formed vs. validating processors, defining the rules by which an implementation can claim to be one or the other. o Replaces the IANA language tag registry with a language subtag registry that provides a complete list of valid subtags in the IANA registry. This allows for robust implementation and ease of maintenance. The language subtag registry becomes the canonical source for forming language tags. o Provides a process that guarantees stability of language tags, by handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in the event that they register a previously used value for a new purpose. o Allows ISO 15924 script code subtags and allows them to be used generatively. Defines a method for indicating in the registry when script subtags are necessary for a given language tag. o Adds the concept of a variant subtag and allows variants to be used generatively. o Adds the ability to use a class of UN M.49 tags for supra-national regions and to resolve conflicts in the assignment of ISO 3166 codes. o Defines the private use tags in ISO 639, ISO 15924, and ISO 3166 as the mechanism for creating private use language, script, and region subtags respectively. o Adds a well-defined extension mechanism. o Defines an extended language subtag, possibly for use with certain anticipated features of ISO 639-3.
Received on Tuesday, 18 October 2005 10:27:30 UTC