- From: Al Gilman <asgilman@iamdigex.net>
- Date: Fri, 02 Aug 2002 09:29:54 -0400
- To: John Cowan <jcowan@reutershealth.com>, w3c-xml-plenary@w3.org, w3c-i18n-ig@w3.org
- Cc: xml-editor@w3.org, w3c-xml-core-wg@w3.org
At 08:46 AM 2002-08-02, John Cowan wrote: >The W3C XML Core WG has decided to allow the value of xml:lang, the >attribute for indicating the natural language of character data, to >be an empty string in order to allow the explicit expression of >language-less text inside language-marked text. Here's an example: > ><p lang="en"> > Here is an example of some C code: > <pre xml:lang=""> > #include "stdio.h" > main() {printf("Hello world!"};} > </pre> ></p> > >By the present rules, there is no way to express the fact that the >content of the pre element is not in English. (Computer languages are out >of scope for RFC 3066 and have no codes.) > >However, the WG is divided on the question of whether to issue an >erratum to XML 1.0 or to make this provision part of XML 1.1. > >Argument for XML 1.1: It is a new feature and as such belongs in XML 1.1, >which we are conveniently issuing shortly anyway. > >Argument for erratum: It is just a single new allowed value for an attribute >that already got a whole lot of new values when we upgraded (by existing >erratum E11) from the obsolete RFC 1766 to the current RFC 3066. >For example, "haw" was an illegal tag under 1766, but refers to the >Hawai'ian language now. > >Note: The XML Schema Datatypes document still references the obsolete RFC, >but defers to XML 1.0 2e for the exact rules, so an erratum would immediately >allow the empty string in objects of type xsd:language; an XML 1.1 >change would not immediately allow it. > >Note: Any application that processes xml:lang has to already be prepared >for thousands of legal values, most of which it will not understand. >For example, de-jp is legal, symbolizing the variety of German spoken and >written in Japan, whatever that might be. > >Note: The existing code "und" is not synonymous with the proposed use of the >empty string. The "und" code means that the text is in some natural language, >but we don't know which one; the empty string means that the text is not >in a natural language. This assertion is fatuous. Un-enforceably vague. The 'und' mark at least is well posed, if it means "one of the defined language labels applies, but we don't know which." This is a union type. Distinguishing between a) a natural language for which there is no label registered b) "not a natural language" has no portable definition among different agents applying 'lang' attribute values, and hence should not be presumed known by these agents. It would be fine to have a 'noneOfTheAbove' value for the 'lang' attribute. However, for practical purposes a 'nil' on 'lang' inside a natural-language context will be sufficient to disabuse the processor of following the rules of the natural language in the enclosing scope. Process question -- who defines the 'und' token? Is this a meta-value defined in the IETF RFC, or is this an invention of XSD Types or of XML? Introducing a 'nil' compatible with the use thereof in XQuery would be a suitable erratum if this is not already allowed. Introducing the suggested sense for the null string would appear to be a bad idea on the grounds that the sense bound to this sign is ill-posed, not interoperable. So don't go there. Al >Disclosure: I personally favor issuing an erratum. > >Please send public comments on the question "erratum vs. XML 1.1" to >xml-editor@w3.org, which is also copied on this mail. >W3C-confidential comments may be sent to w3c-xml-core-wg@w3.org, which >is also copied on this mail. > >-- >John Cowan <jcowan@reutershealth.com> >http://www.ccil.org/~cowan http://www.reutershealth.com >Unified Gaelic in Cyrillic script! > http://groups.yahoo.com/group/Celticonlang
Received on Friday, 2 August 2002 09:30:00 UTC