- From: Misha Wolf <misha.wolf@reuters.com>
- Date: Thu, 20 Feb 1997 16:14:17 +0000 (GMT)
- To: meta2 <meta2@mrrl.lut.ac.uk>, Search <search@mccmedia.com>, www-international <www-international@w3.org>, Unicode <unicode@unicode.org>
draft-kunze-dc-00.txt - "Dublin Core Metadata for Simple Resource Description" says: 4.12. Language Label: LANGUAGE Language(s) of the intellectual content of the resource. Where practical, the content of this field should coincide with the NISO Z39.53 three character codes for written languages. Though the use of this scheme may be widespread in the (US?) library community, language labeling on the Internet doesn't use this scheme, but rather that of ISO 639 - "Codes for the representation of names of languages", together with ISO 3166 - "Codes for the representation of names of countries". The application of these standards to the Internet is specified by RFC 1766 - "Tags for the Identification of Languages" and by RFC 2070 - "Internationalization of the Hypertext Markup Language". These RFCs are, in turn, referenced by many others. The language tags defined by RFC 1766 are multipart, eg "en" and "en-us", but are interpreted as single tokens, without an inner structure. RFC 2070 introduces the concept of a language hierarchy, which is especially useful in the context of the Web. A user may search for documents which are in "en-us" and get only those. Alternatively, s/he may search for documents in "en" and would get documents in "en", "en-us" etc. As this approach is in widespread use on the Web, the adoption of a different scheme, no matter how popular within a specific community, would be most unfortunate. Misha
Received on Thursday, 20 February 1997 11:17:00 UTC