W3C home > Mailing lists > Public > xml-editor@w3.org > July to September 2002

XML Core WG needs input on xml:lang=""

From: John Cowan <jcowan@reutershealth.com>
Date: Fri, 2 Aug 2002 08:46:57 -0400 (EDT)
Message-Id: <200208021259.IAA04944@mail2.reutershealth.com>
To: w3c-xml-plenary@w3.org, w3c-i18n-ig@w3.org
Cc: xml-editor@w3.org, w3c-xml-core-wg@w3.org

The W3C XML Core WG has decided to allow the value of xml:lang, the
attribute for indicating the natural language of character data, to
be an empty string in order to allow the explicit expression of
language-less text inside language-marked text.  Here's an example:

<p lang="en">
  Here is an example of some C code:
  <pre xml:lang="">
     #include "stdio.h"
     main() {printf("Hello world!"};}
  </pre>
</p>

By the present rules, there is no way to express the fact that the
content of the pre element is not in English.  (Computer languages are out
of scope for RFC 3066 and have no codes.)

However, the WG is divided on the question of whether to issue an
erratum to XML 1.0 or to make this provision part of XML 1.1.

Argument for XML 1.1:  It is a new feature and as such belongs in XML 1.1,
which we are conveniently issuing shortly anyway.

Argument for erratum:  It is just a single new allowed value for an attribute
that already got a whole lot of new values when we upgraded (by existing
erratum E11) from the obsolete RFC 1766 to the current RFC 3066.
For example, "haw" was an illegal tag under 1766, but refers to the
Hawai'ian language now.

Note: The XML Schema Datatypes document still references the obsolete RFC,
but defers to XML 1.0 2e for the exact rules, so an erratum would immediately
allow the empty string in objects of type xsd:language; an XML 1.1
change would not immediately allow it.

Note: Any application that processes xml:lang has to already be prepared
for thousands of legal values, most of which it will not understand.
For example, de-jp is legal, symbolizing the variety of German spoken and
written in Japan, whatever that might be.

Note: The existing code "und" is not synonymous with the proposed use of the
empty string.  The "und" code means that the text is in some natural language,
but we don't know which one; the empty string means that the text is not
in a natural language.

Disclosure: I personally favor issuing an erratum.

Please send public comments on the question "erratum vs. XML 1.1" to
xml-editor@w3.org, which is also copied on this mail.
W3C-confidential comments may be sent to w3c-xml-core-wg@w3.org, which
is also copied on this mail.

-- 
John Cowan                              <jcowan@reutershealth.com>
http://www.ccil.org/~cowan              http://www.reutershealth.com
Unified Gaelic in Cyrillic script!
        http://groups.yahoo.com/group/Celticonlang
Received on Friday, 2 August 2002 08:49:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:32 GMT