- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 18 Oct 2005 19:27:08 +0900
- To: w3c-xml-cg@w3.org, public-xml-core-wg@w3.org, "w3c-xml-schema-wg@w3.org" <w3c-xml-schema-wg@w3.org>
- Cc: "member-i18n-core@w3.org" <member-i18n-core@w3.org>
Hello XML Core, XML Schema and XML CG Working Groups,
This mail is just to inform you that the IESG approved RFC3066bis, the
revision of RFC 3066 "Tags for the Identification of Languages". The
revision was undertaken mainly by Addison Philipps, chair of the i18n core
working group, and Mark Davis (IBM). The document is not yet in its final
location, but you can find a copy at
[1] http://www.ietf.org/mail-archive/web/ltru/current/msg03949.html
Below there is a summary of the changes from RFC 3066, taken from RFC
3066bis. I will keep you informed of further developments.
Best regards,
Felix Sasaki (team contact of i18n core)
The main goals for this revision of language tags were the following:
*Compatibility.* All RFC 3066 language tags (including those in the
IANA registry) remain valid in this specification. The changes in
this document represent additional constraints on language tags.
That is, in no case is the syntax more permissive and processors
based on the ABNF and other provisions of RFC 3066 (such as those
described in [XMLSchema]) will be able to process the tags described
by this document. In addition, this document defines language tags
in such as way as to ensure future compatibility.
*Stability.* Because of changes in the past in the underlying ISO
standards, a valid RFC 3066 language tag could become invalid or have
its meaning change. This has the potential of invalidating content
that may have an extensive shelf-life. In this specification, once a
language tag is valid, it remains valid forever.
*Validity.* The structure of language tags defined by this document
makes it possible to determine if a particular tag is well-formed
without regard for the actual content or "meaning" of the tag as a
whole. This is important because the registry grows and underlying
standards change over time. In addition, it must be possible to
determine if a tag is valid (or not) for a given point in time in
order to provide reproducible, testable results. This process must
not be error-prone; otherwise implementations might give different
results. By having an authoritative registry with specific
versioning information, the validity of language tags at any point in
time can be precisely determined (instead of interpolating values
from many separate sources).
*Utility.* It is sometimes important to be able to differentiate
between written forms of a language -- for many implementations this
is more important than distinguishing between the spoken variants of
a language. Languages are written in a wide variety of different
scripts, so this document provides for the generative use of ISO
15924 script codes. Like the generative use of ISO language and
country codes in RFC 3066, this allows combinations to be produced
without resorting to the registration process. The addition of UN
M.49 codes provides for the generation of language tags with regional
scope, which is also required by some applications.
The recast of the registry from containing whole language tags to
subtags is a key part of this. An important feature of RFC 3066 was
that it allowed generative use of subtags. This allows people to
meaningfully use generated tags, without the delays in registering
whole tags or the need to register all of the combinations that might
be useful.
The choice of placing the extended language and script subtags
between the primary language and region subtags was widely debated.
This design was chosen because the prevalent matching and content
negotiation schemes rely on the subtags being arranged in order of
increasing specificity. That is, the subtags that mark a greater
barrier to mutual intelligibility appear left-most in a tag. For
example, when selecting content written in Azerbaijani, the script
(Arabic, Cyrillic, or Latin) represents a greater barrier to
understanding than any regional variations (those associated with
Azerbaijan or Iran, for example). Individuals who prefer documents
in a particular script, but can deal with the minor regional
differences, can therefore select appropriate content. Applications
that do not deal with written content will continue to omit these
subtags.
*Extensibility.* Because of the widespread use of language tags, it
is disruptive to have periodic revisions of the core specification,
even in the face of demonstrated need. The extension mechanism
provides for a way for independent RFCs to define extensions to
language tags. These extensions have a very constrained, well-
defined structure that prevent extensions from interfering with
implementations of language tags defined in this document.
The document also anticipates features of ISO 639-3 with the addition
of the extended language subtags, as well as the possibility of other
ISO 639 parts becoming useful for the formation of language tags in
the future.
The use and definition of private use tags has also been modified, to
allow people to use private use subtags to extend or modify defined
tags and to move as much information as possible out of private use
and into the regular structure.
The goal for each of these modifications is to reduce or eliminate
the need for future revisions of this document.
The specific changes in this document to meet these goals are:
o Defines the ABNF and rules for subtags so that the category of all
subtags can be determined without reference to the registry.
o Adds the concept of well-formed vs. validating processors,
defining the rules by which an implementation can claim to be one
or the other.
o Replaces the IANA language tag registry with a language subtag
registry that provides a complete list of valid subtags in the
IANA registry. This allows for robust implementation and ease of
maintenance. The language subtag registry becomes the canonical
source for forming language tags.
o Provides a process that guarantees stability of language tags, by
handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in
the event that they register a previously used value for a new
purpose.
o Allows ISO 15924 script code subtags and allows them to be used
generatively. Defines a method for indicating in the registry
when script subtags are necessary for a given language tag.
o Adds the concept of a variant subtag and allows variants to be
used generatively.
o Adds the ability to use a class of UN M.49 tags for supra-national
regions and to resolve conflicts in the assignment of ISO 3166
codes.
o Defines the private use tags in ISO 639, ISO 15924, and ISO 3166
as the mechanism for creating private use language, script, and
region subtags respectively.
o Adds a well-defined extension mechanism.
o Defines an extended language subtag, possibly for use with certain
anticipated features of ISO 639-3.
Received on Tuesday, 18 October 2005 10:27:30 UTC