Re: Empty xml:lang attributes validation

Hi Stephane

Stephane Corlosquet schrieb:
> Can anyone confirm whether xml:lang="" is valid or not? The XML 1.0 [6] says
> it's valid but I'm not sure if this applies to XHTML+RDFa. Is the last claim
> regarding the W3C validator reporting success on invalid markup true?
[...]
> [6] http://www.w3.org/TR/REC-xml/#sec-lang-tag

Simple question, long answer (sorry, but sometimes life is not black or 
white :-).

Indead, the cited text (<http://www.w3.org/TR/REC-xml/#sec-lang-tag>) says:
| in addition, the empty string may be specified.

and later:
| In particular, the empty value of xml:lang is used on an element B to
| override a specification of xml:lang on an enclosing element A,
| without specifying another language.


However...

For XHTML 1.0 (somewhere in 
<http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict>):
| xml:lang    language code (as per XML 1.0 spec)

and
| xml:lang    %LanguageCode; #IMPLIED

with
| <!ENTITY % LanguageCode "NMTOKEN">


Looking up NMTOKEN in XML 1.0 (<http://www.w3.org/TR/REC-xml/#nmtok>):
| Values of type NMTOKEN  MUST  match the Nmtoken  production

and
(<http://www.w3.org/TR/REC-xml/#NT-Nmtoken>):
| [7]  Nmtoken  ::=  (NameChar)+

<http://www.w3.org/TR/REC-xml/#NT-NameChar>:
| [4a] NameCha  ::=  NameStartChar  | "-" | "." | [0-9] | #xB7 |
                      [#x0300-#x036F] | [#x203F-#x2040]

(<http://www.w3.org/TR/REC-xml/#NT-NameStartChar>):
| [4]  NameStartChar  ::=  ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
                            [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] |
                            [#x37F-#x1FFF] | [#x200C-#x200D] |
                            [#x2070-#x218F] | [#x2C00-#x2FEF] |
                            [#x3001-#xD7FF] | [#xF900-#xFDCF] |
                            [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

This indicates that (formally) an empty string is not a NMTOKEN and so 
is no valid value for the xml:lang attribute as defined in the XHTML 1.0 
Strict DTD.


For XHTML languages based on XHTML Modularization (10 April 2001 
version), xml:lang is mentioned  in prose in 
<http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#s_commonatts>
| xml:lang (NMTOKEN)

and defined in the DTD module 
(<http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/dtd_module_defs.html#a_module_XHTML_Common_Attribute_Definitions>)
| xml:lang     %LanguageCode.datatype;  #IMPLIED

with 
(<http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/dtd_module_defs.html#dtdentry_LanguageCode.datatype>)
| <!ENTITY % LanguageCode.datatype "NMTOKEN" >

So, same result as for XHTML 1.0.


The revision (XHTML Modularization 1.1), mentions xml:lang in 
<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/abstract_modules.html#s_commonatts>
| xml:lang (CDATA)

and in the DTD module 
(<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/dtd_module_defs.html#a_module_XHTML_Common_Attribute_Definitions>)
| xml:lang     %LanguageCode.datatype;  #IMPLIED

with 
(<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/dtd_module_defs.html#a_module_XHTML_Datatypes>):
| <!ENTITY % LanguageCode.datatype "CDATA" >

The XML schema module references xml:lang in 
<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/schema_module_defs.html#a_module_XHTML_Datatypes>:
| <xs:attribute ref="xml:lang" />

from <http://www.w3.org/2001/xml.xsd>:

| The union allows for the 'un-declaration' of xml:lang with the empty
| string.
|
| Formal declaration in XSD source form
|
| <xs:attribute name="lang">
|  <xs:annotation>
|   <xs:documentation>
|    <div>
|
|      <h3>lang (as an attribute name)</h3>
|      <p>
|       denotes an attribute whose value
|       is a language code for the natural language of the content of
|       any element; its value is inherited.  This name is reserved
|       by virtue of its definition in the XML specification.</p>
|
|    </div>
|    <div>
|     <h4>Notes</h4>
|     <p>
|      Attempting to install the relevant ISO 2- and 3-letter
|      codes as the enumerated possible values is probably never
|      going to be a realistic possibility.
|     </p>
|     <p>
|      See BCP 47 at <a
| href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt">
|       http://www.rfc-editor.org/rfc/bcp/bcp47.txt</a>
|      and the IANA language subtag registry at
|      <a
| href="http://www.iana.org/assignments/language-subtag-registry">
|       http://www.iana.org/assignments/language-subtag-registry</a>
|      for further information.
|     </p>
|     <p>
|      The union allows for the 'un-declaration' of xml:lang with
|      the empty string.
|     </p>
|    </div>
|   </xs:documentation>
|  </xs:annotation>
|  <xs:simpleType>
|   <xs:union memberTypes="xs:language">
|    <xs:simpleType>
|     <xs:restriction base="xs:string">
|      <xs:enumeration value=""/>
|     </xs:restriction>
|    </xs:simpleType>
|   </xs:union>
|  </xs:simpleType>
| </xs:attribute>

So, in languages based on XHTML Modularization 1.1, the empty string is 
(formally) DTD-valid and XML-Schema-valid.


In the DTD for "XHTML 1.1 + RDFa" 
(<http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd>):

| xml:lang     %LanguageCode.datatype;  #IMPLIED

with (<http://www.w3.org/MarkUp/DTD/xhtml-datatypes-1.mod>)

| <!ENTITY % LanguageCode.datatype "CDATA" >

So, in "XHTML 1.1 + RDFa" the empty string is (formally) DTD-valid.

-- 
Johannes Koch
In te domine speravi; non confundar in aeternum.
                             (Te Deum, 4th cent.)

Received on Wednesday, 14 April 2010 08:29:28 UTC