Re: Standards for language tags

Renato Iannella wrote (privately):

>Hi Misha, at DC4 you mentioned a standard
>that listed the different types of languages
>such as:
> en-uk
> en-us
> ...
>
>Which one was it?

My reply may be of interest to others.

The relevant standards are:

   ISO 639 - Codes for the representation of names of languages

   ISO 3166 - Codes for the representation of names of countries

   RFC 1766 - Tags for the Identification of Languages

   RFC 2070 - Internationalization of the Hypertext Markup Language

Some brief notes:

1.  ISO 639 allows the use of ISO 3166 codes to qualify ISO 639 codes.

2.  RFC 1766 defines a more general structure, of which ISO 639 and 
    ISO 3166 are parts.

3.  RFC 1766 defines the linking character to be a "-", as in "en-us".

4.  RFC 1766 defines a registry for additional sub-tags.  Two have been 
    registered to date: "no-nyn" and "no-bok".

5.  RFC 2070 redefines the interpretation of RFC 1766 tags, making the 
    hierarchy meaningful (rather than just a mechanism for tag 
    construction):

    --- start of quote from RFC 2070 -----------------------------------

    In the context of HTML, a language tag is not to be interpreted as a
    single token, as per RFC 1766, but as a hierarchy. For example, a
    user agent that adjusts rendering according to language should
    consider that it has a match when a language tag in a style sheet
    entry matches the initial portion of the language tag of an element.
    An exact match should be preferred. This interpretation allows an
    element marked up as, for instance, "en-US" to trigger styles
    corresponding to, in order of preference, US-English ("en-US") or
    'plain' or 'international' English ("en").

    --- end of quote from RFC 2070 -------------------------------------

6.  ISO 639 was last published in 1988.  A few changes were made in 1989:

    --- start of quote from RFC 1766 -----------------------------------

    The following codes have been added in 1989 (nothing later): ug
    (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang), he (Hebrew,
    replacing iw), yi (Yiddish, replacing ji), and id (Indonesian,
    replacing in).

    --- end of quote from RFC 1766 -----------------------------------

7.  As some people seem very keen on the US LoC-style 3-char language 
    tags, someone could decide to issue an RFC updating RFC 1766 to 
    include support for an updated ISO 639, incorporating these 3-char 
    tags.  [I believe that voting on such a change to ISO 639 is currently 
    in progress.]  This idea was aired at Canberra.  My mentioning it here 
    should not be taken as an expression of support, but rather as noting 
    a possible way to reconcile these two schemes.

8.  Do read the two RFCs mentioned above.

>Cheers... Renato
>. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>Dr Renato Iannella                 http://www.dstc.edu.au/RDU/staff/ri/
>DSTC Pty Ltd                                     phone://61/7-3365-4310
>Gehrmann Labs, QLD, 4067, AUSTRALIA                fax://61/7-3365-4311
>. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>Australian WWW Technical Conference '97 -> http://www.dstc.edu.au/aw3tc

Misha

Received on Tuesday, 25 March 1997 11:22:29 UTC