- From: Johannes Koch <koch@w3development.de>
- Date: Wed, 14 Apr 2010 10:28:43 +0200
- To: Stephane Corlosquet <scorlosquet@gmail.com>
- Cc: sioc-dev@googlegroups.com, Public RDFa <public-rdfa@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Hi Stephane
Stephane Corlosquet schrieb:
> Can anyone confirm whether xml:lang="" is valid or not? The XML 1.0 [6] says
> it's valid but I'm not sure if this applies to XHTML+RDFa. Is the last claim
> regarding the W3C validator reporting success on invalid markup true?
[...]
> [6] http://www.w3.org/TR/REC-xml/#sec-lang-tag
Simple question, long answer (sorry, but sometimes life is not black or
white :-).
Indead, the cited text (<http://www.w3.org/TR/REC-xml/#sec-lang-tag>) says:
| in addition, the empty string may be specified.
and later:
| In particular, the empty value of xml:lang is used on an element B to
| override a specification of xml:lang on an enclosing element A,
| without specifying another language.
However...
For XHTML 1.0 (somewhere in
<http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict>):
| xml:lang language code (as per XML 1.0 spec)
and
| xml:lang %LanguageCode; #IMPLIED
with
| <!ENTITY % LanguageCode "NMTOKEN">
Looking up NMTOKEN in XML 1.0 (<http://www.w3.org/TR/REC-xml/#nmtok>):
| Values of type NMTOKEN MUST match the Nmtoken production
and
(<http://www.w3.org/TR/REC-xml/#NT-Nmtoken>):
| [7] Nmtoken ::= (NameChar)+
<http://www.w3.org/TR/REC-xml/#NT-NameChar>:
| [4a] NameCha ::= NameStartChar | "-" | "." | [0-9] | #xB7 |
[#x0300-#x036F] | [#x203F-#x2040]
(<http://www.w3.org/TR/REC-xml/#NT-NameStartChar>):
| [4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
[#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] |
[#x37F-#x1FFF] | [#x200C-#x200D] |
[#x2070-#x218F] | [#x2C00-#x2FEF] |
[#x3001-#xD7FF] | [#xF900-#xFDCF] |
[#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
This indicates that (formally) an empty string is not a NMTOKEN and so
is no valid value for the xml:lang attribute as defined in the XHTML 1.0
Strict DTD.
For XHTML languages based on XHTML Modularization (10 April 2001
version), xml:lang is mentioned in prose in
<http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#s_commonatts>
| xml:lang (NMTOKEN)
and defined in the DTD module
(<http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/dtd_module_defs.html#a_module_XHTML_Common_Attribute_Definitions>)
| xml:lang %LanguageCode.datatype; #IMPLIED
with
(<http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/dtd_module_defs.html#dtdentry_LanguageCode.datatype>)
| <!ENTITY % LanguageCode.datatype "NMTOKEN" >
So, same result as for XHTML 1.0.
The revision (XHTML Modularization 1.1), mentions xml:lang in
<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/abstract_modules.html#s_commonatts>
| xml:lang (CDATA)
and in the DTD module
(<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/dtd_module_defs.html#a_module_XHTML_Common_Attribute_Definitions>)
| xml:lang %LanguageCode.datatype; #IMPLIED
with
(<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/dtd_module_defs.html#a_module_XHTML_Datatypes>):
| <!ENTITY % LanguageCode.datatype "CDATA" >
The XML schema module references xml:lang in
<http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/schema_module_defs.html#a_module_XHTML_Datatypes>:
| <xs:attribute ref="xml:lang" />
from <http://www.w3.org/2001/xml.xsd>:
| The union allows for the 'un-declaration' of xml:lang with the empty
| string.
|
| Formal declaration in XSD source form
|
| <xs:attribute name="lang">
| <xs:annotation>
| <xs:documentation>
| <div>
|
| <h3>lang (as an attribute name)</h3>
| <p>
| denotes an attribute whose value
| is a language code for the natural language of the content of
| any element; its value is inherited. This name is reserved
| by virtue of its definition in the XML specification.</p>
|
| </div>
| <div>
| <h4>Notes</h4>
| <p>
| Attempting to install the relevant ISO 2- and 3-letter
| codes as the enumerated possible values is probably never
| going to be a realistic possibility.
| </p>
| <p>
| See BCP 47 at <a
| href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt">
| http://www.rfc-editor.org/rfc/bcp/bcp47.txt</a>
| and the IANA language subtag registry at
| <a
| href="http://www.iana.org/assignments/language-subtag-registry">
| http://www.iana.org/assignments/language-subtag-registry</a>
| for further information.
| </p>
| <p>
| The union allows for the 'un-declaration' of xml:lang with
| the empty string.
| </p>
| </div>
| </xs:documentation>
| </xs:annotation>
| <xs:simpleType>
| <xs:union memberTypes="xs:language">
| <xs:simpleType>
| <xs:restriction base="xs:string">
| <xs:enumeration value=""/>
| </xs:restriction>
| </xs:simpleType>
| </xs:union>
| </xs:simpleType>
| </xs:attribute>
So, in languages based on XHTML Modularization 1.1, the empty string is
(formally) DTD-valid and XML-Schema-valid.
In the DTD for "XHTML 1.1 + RDFa"
(<http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd>):
| xml:lang %LanguageCode.datatype; #IMPLIED
with (<http://www.w3.org/MarkUp/DTD/xhtml-datatypes-1.mod>)
| <!ENTITY % LanguageCode.datatype "CDATA" >
So, in "XHTML 1.1 + RDFa" the empty string is (formally) DTD-valid.
--
Johannes Koch
In te domine speravi; non confundar in aeternum.
(Te Deum, 4th cent.)
Received on Wednesday, 14 April 2010 08:29:29 UTC