- From: John Cowan <jcowan@reutershealth.com>
- Date: Fri, 2 Aug 2002 08:46:57 -0400 (EDT)
- To: w3c-xml-plenary@w3.org, w3c-i18n-ig@w3.org
- Cc: xml-editor@w3.org, w3c-xml-core-wg@w3.org
The W3C XML Core WG has decided to allow the value of xml:lang, the attribute for indicating the natural language of character data, to be an empty string in order to allow the explicit expression of language-less text inside language-marked text. Here's an example: <p lang="en"> Here is an example of some C code: <pre xml:lang=""> #include "stdio.h" main() {printf("Hello world!"};} </pre> </p> By the present rules, there is no way to express the fact that the content of the pre element is not in English. (Computer languages are out of scope for RFC 3066 and have no codes.) However, the WG is divided on the question of whether to issue an erratum to XML 1.0 or to make this provision part of XML 1.1. Argument for XML 1.1: It is a new feature and as such belongs in XML 1.1, which we are conveniently issuing shortly anyway. Argument for erratum: It is just a single new allowed value for an attribute that already got a whole lot of new values when we upgraded (by existing erratum E11) from the obsolete RFC 1766 to the current RFC 3066. For example, "haw" was an illegal tag under 1766, but refers to the Hawai'ian language now. Note: The XML Schema Datatypes document still references the obsolete RFC, but defers to XML 1.0 2e for the exact rules, so an erratum would immediately allow the empty string in objects of type xsd:language; an XML 1.1 change would not immediately allow it. Note: Any application that processes xml:lang has to already be prepared for thousands of legal values, most of which it will not understand. For example, de-jp is legal, symbolizing the variety of German spoken and written in Japan, whatever that might be. Note: The existing code "und" is not synonymous with the proposed use of the empty string. The "und" code means that the text is in some natural language, but we don't know which one; the empty string means that the text is not in a natural language. Disclosure: I personally favor issuing an erratum. Please send public comments on the question "erratum vs. XML 1.1" to xml-editor@w3.org, which is also copied on this mail. W3C-confidential comments may be sent to w3c-xml-core-wg@w3.org, which is also copied on this mail. -- John Cowan <jcowan@reutershealth.com> http://www.ccil.org/~cowan http://www.reutershealth.com Unified Gaelic in Cyrillic script! http://groups.yahoo.com/group/Celticonlang
Received on Friday, 2 August 2002 08:49:42 UTC