RE: Language declarations in XHTML 1.1

Richard identifies:
a) XHTML 1.1 supports only an xml:lang attribute.

b) xml:lang is not recognized by major user agents that process text/html

c) HTML5 recognizes only the lang attribute for language declaration. It allows xml:lang.

Therefore:
there is no way of effectively declaring language in XHTML 1.1 documents served as text/html.

Tex:
Forcing UA to support xml:lang and also driving HTML5 to support xml:lang, seems like a bigger effort than adding lang to XHTML 1.1. 
XHTML 1.1 has fewer users and UA that support it probably already support lang.

Add lang to XHTML 1.1 and yes make it equivalent to xml:lang.

For HTML5, there should be a rule that if you have both lang and xml:lang, that lang should take precedence.
This is equivalent to recognizing lang and allowing xml:lang.
The only case that it doesn’t cover is if you have xml:lang without lang. Is this worth worrying about in html?

Do we have any statistics on meaningful usage of lang?
(By Meaningful, I am referring to usage in multilingual documents or labeling of a document where it is helpful to applications that wouldn't otherwise be able to detect the language)



-----Original Message-----
From: Richard Ishida [mailto:ishida@w3.org] 
Sent: Friday, September 26, 2008 10:25 AM
To: www-international@w3.org
Subject: Language declarations in XHTML 1.1


The 2nd edition version of XHMTL 1.1 says:

"XHTML 1.1 documents SHOULD be labeled with the Internet Media Type text/html as defined in [RFC2854] or application/xhtml+xml as defined in [RFC3236]." [1]

Unlike XHTML 1.0, however, XHTML 1.1 does not define a lang attribute, only an xml:lang attribute.

xml:lang is not recognized for language declaration by major user agents that process text/html, and the HTML5 spec currently recognizes only the lang attribute for language declaration, although it allows xml:lang.

The upshot of this is that there is no way of effectively declaring language in XHTML 1.1 documents served as text/html.


One approach to this issue would be to add a lang attribute to XHTML 1.1, but you would have to get authors to continue to use both lang and xml:lang for such documents so that language is recognized in both XML and XML contexts.  This is already a nuisance for authors who use XHTML 1.0.  I do not expect that XML applications would begin to recognize the lang attribute, so it would be there purely for compatibility with HTML.

The other approach would be for user agents to recognize that an xml:lang element is saying the same thing as a lang attribute, and to specify that equivalence in HTML5.  This would also make life easier for authors using any flavor of XHTML, since they would only need to specify language in a single attribute (xml:lang) and it would work in both XML and XML contexts.

This is my proposed solution.  I know that that that then pulls in questions about the use of the xml:lang namespace or not, and what to do with a lang and xml:lang attribute on the same element with different values, but those are second-order questions in my mind.

Do we recommend this to the HTML WG?

RI



[1] http://www.w3.org/TR/xhtml11/conformance.html


============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/

http://rishida.net/

Received on Tuesday, 30 September 2008 10:54:18 UTC