Development of RFC 3066bis from Felix Sasaki on 2005-10-18 (public-xml-core-wg@w3.org from October 2005)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 18 Oct 2005 19:27:08 +0900
To: w3c-xml-cg@w3.org, public-xml-core-wg@w3.org, "w3c-xml-schema-wg@w3.org" <w3c-xml-schema-wg@w3.org>
Cc: "member-i18n-core@w3.org" <member-i18n-core@w3.org>
Message-ID: <op.syt7bib9x1753t@ibm-60d333fc0ec>
Hello XML Core, XML Schema and XML CG Working Groups,

This mail is just to inform you that the IESG approved RFC3066bis, the  
revision of RFC 3066 "Tags for the Identification of Languages". The  
revision was undertaken mainly by Addison Philipps, chair of the i18n core  
working group, and Mark Davis (IBM). The document is not yet in its final  
location, but you can find a copy at

[1] http://www.ietf.org/mail-archive/web/ltru/current/msg03949.html

Below there is a summary of the changes from RFC 3066, taken from RFC  
3066bis. I will keep you informed of further developments.

Best regards,

Felix Sasaki (team contact of i18n core)

The main goals for this revision of language tags were the following:

    *Compatibility.* All RFC 3066 language tags (including those in the
    IANA registry) remain valid in this specification.  The changes in
    this document represent additional constraints on language tags.
    That is, in no case is the syntax more permissive and processors
    based on the ABNF and other provisions of RFC 3066 (such as those
    described in [XMLSchema]) will be able to process the tags described
    by this document.  In addition, this document defines language tags
    in such as way as to ensure future compatibility.

    *Stability.* Because of changes in the past in the underlying ISO
    standards, a valid RFC 3066 language tag could become invalid or have
    its meaning change.  This has the potential of invalidating content
    that may have an extensive shelf-life.  In this specification, once a
    language tag is valid, it remains valid forever.

    *Validity.* The structure of language tags defined by this document
    makes it possible to determine if a particular tag is well-formed
    without regard for the actual content or "meaning" of the tag as a
    whole.  This is important because the registry grows and underlying
    standards change over time.  In addition, it must be possible to
    determine if a tag is valid (or not) for a given point in time in
    order to provide reproducible, testable results.  This process must
    not be error-prone; otherwise implementations might give different
    results.  By having an authoritative registry with specific
    versioning information, the validity of language tags at any point in
    time can be precisely determined (instead of interpolating values
    from many separate sources).

   *Utility.* It is sometimes important to be able to differentiate
    between written forms of a language -- for many implementations this
    is more important than distinguishing between the spoken variants of
    a language.  Languages are written in a wide variety of different
    scripts, so this document provides for the generative use of ISO
    15924 script codes.  Like the generative use of ISO language and
    country codes in RFC 3066, this allows combinations to be produced
    without resorting to the registration process.  The addition of UN
    M.49 codes provides for the generation of language tags with regional
    scope, which is also required by some applications.

    The recast of the registry from containing whole language tags to
    subtags is a key part of this.  An important feature of RFC 3066 was
    that it allowed generative use of subtags.  This allows people to
    meaningfully use generated tags, without the delays in registering
    whole tags or the need to register all of the combinations that might  
be useful.

    The choice of placing the extended language and script subtags
    between the primary language and region subtags was widely debated.
    This design was chosen because the prevalent matching and content
    negotiation schemes rely on the subtags being arranged in order of
    increasing specificity.  That is, the subtags that mark a greater
    barrier to mutual intelligibility appear left-most in a tag.  For
    example, when selecting content written in Azerbaijani, the script
    (Arabic, Cyrillic, or Latin) represents a greater barrier to
    understanding than any regional variations (those associated with
    Azerbaijan or Iran, for example).  Individuals who prefer documents
    in a particular script, but can deal with the minor regional
    differences, can therefore select appropriate content.  Applications
    that do not deal with written content will continue to omit these
    subtags.

    *Extensibility.* Because of the widespread use of language tags, it
    is disruptive to have periodic revisions of the core specification,
    even in the face of demonstrated need.  The extension mechanism
    provides for a way for independent RFCs to define extensions to
    language tags.  These extensions have a very constrained, well-
    defined structure that prevent extensions from interfering with
    implementations of language tags defined in this document.

    The document also anticipates features of ISO 639-3 with the addition
    of the extended language subtags, as well as the possibility of other
    ISO 639 parts becoming useful for the formation of language tags in
    the future.

    The use and definition of private use tags has also been modified, to
    allow people to use private use subtags to extend or modify defined
    tags and to move as much information as possible out of private use
    and into the regular structure.

    The goal for each of these modifications is to reduce or eliminate
    the need for future revisions of this document.

    The specific changes in this document to meet these goals are:

    o  Defines the ABNF and rules for subtags so that the category of all
       subtags can be determined without reference to the registry.

    o  Adds the concept of well-formed vs. validating processors,
       defining the rules by which an implementation can claim to be one
       or the other.

    o  Replaces the IANA language tag registry with a language subtag
       registry that provides a complete list of valid subtags in the
       IANA registry.  This allows for robust implementation and ease of
       maintenance.  The language subtag registry becomes the canonical
       source for forming language tags.

    o  Provides a process that guarantees stability of language tags, by
       handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in
       the event that they register a previously used value for a new
       purpose.

    o  Allows ISO 15924 script code subtags and allows them to be used
       generatively.  Defines a method for indicating in the registry
       when script subtags are necessary for a given language tag.

    o  Adds the concept of a variant subtag and allows variants to be
       used generatively.

    o  Adds the ability to use a class of UN M.49 tags for supra-national
       regions and to resolve conflicts in the assignment of ISO 3166
       codes.

    o  Defines the private use tags in ISO 639, ISO 15924, and ISO 3166
       as the mechanism for creating private use language, script, and
       region subtags respectively.

    o  Adds a well-defined extension mechanism.

    o  Defines an extended language subtag, possibly for use with certain
       anticipated features of ISO 639-3.
Received on Tuesday, 18 October 2005 10:27:30 UTC