Language codes from Misha Wolf on 1997-02-20 (www-international@w3.org from January to March 1997)

From: Misha Wolf <misha.wolf@reuters.com>
Date: Thu, 20 Feb 1997 16:14:17 +0000 (GMT)
To: meta2 <meta2@mrrl.lut.ac.uk>, Search <search@mccmedia.com>, www-international <www-international@w3.org>, Unicode <unicode@unicode.org>
Message-Id: <1717141620021997/A92936/REDMS1/11B2A40E1100*@MHS>

draft-kunze-dc-00.txt - "Dublin Core Metadata for Simple Resource 
Description" says:

   4.12. Language          Label: LANGUAGE

      Language(s) of the intellectual content of the resource.  Where
      practical, the content of this field should coincide with the
      NISO Z39.53 three character codes for written languages.

Though the use of this scheme may be widespread in the (US?) library 
community, language labeling on the Internet doesn't use this scheme, but 
rather that of ISO 639 - "Codes for the representation of names of 
languages", together with ISO 3166 - "Codes for the representation of names 
of countries".  The application of these standards to the Internet is 
specified by RFC 1766 - "Tags for the Identification of Languages" and by 
RFC 2070 - "Internationalization of the Hypertext Markup Language".  These 
RFCs are, in turn, referenced by many others.

The language tags defined by RFC 1766 are multipart, eg "en" and "en-us", 
but are interpreted as single tokens, without an inner structure.  RFC 2070 
introduces the concept of a language hierarchy, which is especially useful 
in the context of the Web.  A user may search for documents which are in 
"en-us" and get only those.  Alternatively, s/he may search for documents in 
"en" and would get documents in "en", "en-us" etc.

As this approach is in widespread use on the Web, the adoption of a 
different scheme, no matter how popular within a specific community, would 
be most unfortunate.

Misha

Received on Thursday, 20 February 1997 11:17:00 UTC