ISSUE 88 - Change proposal (updated)

(I am stilling waiting for the chairs’ acknowledgement.)

ISSUE 88
========

HTML5 Change Proposal for Content-Language
http://www.w3.org/html/wg/wiki/ChangeProposals/lang_versus_contentLanguage

Date: 9th of April.

Summary
-------
  * Only the last occurring meta content-language counts w.r.t. 
authoring conformance.
  * The value of the content attribute of the last occurring meta 
content-language element must be the empty string.
  * The value of the content attribute in possible preceding meta 
content-language elements should conform to RFC2616 – and validators 
may validate the possible preceding elements for RFC 2616 conformance. 
However, only the value of the last occurring meta content-language 
element has any bearing on the document’s HTML5 validity.
  * Ian’s language determination algorithm is changed in one point: If 
the last occurring meta content-language declaration is empty, then it 
must be interpreted by user agents as having the same semantics as an 
empty lang or xml:lang attribute – meaning that they must not ask if 
the HTTP header has any other language information to provide. (Thus, 
only when the last occurring meta declaration contains multiple 
language tags, would conforming user agents be required to pay 
attention to whether the HTTP header contains a language tag or not.)

Rationale
---------
  * The last occurring meta content-language element always wins in 
current user agents – let’s spec this.
  * At the same time, as Ian explains in his change proposal variants, 
interpretation of content-language differs across browsers.
  * The safest value is the empty string, as this value doesn’t 
interfere with with how user agents interpret lang and xml:lang. Most 
user agents already interpret this value in accordance with this change 
proposal. (Only Gecko treats it in accordance with Ian’s zero change 
proposal.) Therefore, only the empty string should be considered 
conforming (in the last occurring meta declaration). Through this, 
authors see for themselves that they must apply the lang attribute 
whenever they want to declare the language of the document.
  * By not counting the value of possible preceding meta 
content-language elements when HTML5 conformance is evaluated, we 
satisfy two communities: the I18N community (who want to be able to use 
multiple values) and authors wanting to create HTML5 documents that 
works in Mozilla browsers (they want to be able to cancel the effect of 
HTTP headers in Gecko)
  * By treating the empty string in the content attribute as equal to 
an empty lang attribute, we simplify the algorithm for user agents – 
this is already how all – except Gecko – work. In the same go, we also 
maintain things more predictable for authors.

Details
-------
  1. The authoring requirements for meta content-language must change, 
as described above.
  2. The language determination algorithm must change as described 
above.

Impact
------
  * Predictability: Authors have experience with how things works 
today. And this proposal is the best match with current reality. The 
empty string is the meta content-language value with best cross browser 
compatibility..
  * We allow those in the know to follow RFC 2616 and/or fix the issues 
with Gecko by reserving preceding meta content-language elements for 
this.
  * We send a strong signal – a requirement to eventually use an empty 
meta content-language element! – about the need to use lang for setting 
the language of the document.
  * We allow authors to make use of HTML5’s semantics of the empty 
<code>lang</code> attribute in many current browsers, and put weight on 
authors and vendors to implement this new semantic feature of lang.

Risks
-----
  * None.

References
----------

How meta content-language affects different browsers.

IE8 edge mode
-------------
  1. IE8 in edge mode understands the CSS :lang(*) selector. 
  2. It interpret both the meta declaration and the HTTP header. 
  3. It doesn’t let the interpretation of an empty lang be affected by 
the content-language meta declaration and/or the HTTP header.

Gecko
-----
  1. Gecko does respect the semantics of the empty lang. Thus, in a 
page where all the language information ''only'' arrives from lang or 
xml:lang (that is: no meta content-language which Gecko is able to read 
is present), the CSS selector 
       div[lang=""]:lang(en){background:red}</code>
     does – as it is the correct behavior – not work. [1]
  2. But Gecko (Firefox version 2 and onwards) is immediately affected 
if a meta content-language declaration with a language tag is inserted. 
[2]
  3. At the same time, Gecko doesn’t treat an empty meta 
content-language declaration the same way that it treats an empty lang. 
In this case, instead of accepting that the language is unknown (like 
IE8, KHTML, Webkit, Chrome and Opera ), it either looks at the 
preceding meta (if any). [3]
  4. Or, when there is no meta, it looks at the HTTP header, if any. [4]
  5. These issues can be corrected by inserting a cancelling code in 
the preceding (the second last occurring) meta content-language 
declaration. [5].
  6. With these authoring guides, one can also use multiple values, 
without any negative effect. [6]

KHTML, Webkit, Chrome
---------------------
  1. These browsers does not look at the HTTP header. They also treat 
the empty meta content-language like they treat an empty lang. But 
these browsers have a bug in that they do not respect the semantics of 
the empty lang. [7]
  2. They treat the meta content-language element the same way. (And 
then the Mozilla bug also kicks in.) [8]
  3. Thus, from these browser’s point of view, the requirement that the 
last occurring meta content-language must be empty, is often 
irrelevant, as long as the author has used a non-empty lang on the root 
element.
  4. But when authors do not use a non-empty lang on the root element, 
then the requirement that the last occurring meta content-language 
element must be empty, can still be useful when creating cross browser 
solutions which try to be compatible with KHTML, Webkit and Chrome as 
well.

Opera
-----
  * Opera also has issues with how it reacts to the meta 
content-language values. Thus this change proposal is also useful for 
current versions of Opera.

Other browsers
--------------
  * I have so far not been able to test other browsers with CSS 
*:lang(*) support.

[1] 
http://malform.no/testing/html5/attr-lang/mozilla-lang-lottery/lang-inherit

[2] 
http://malform.no/testing/html5/attr-lang/mozilla-lang-lottery/lang-inherit-cl

[3] 
http://malform.no/testing/html5/attr-lang/mozilla-lang-lottery/lang-inherit-cl-empty

[4] 
http://malform.no/testing/html5/attr-lang/mozilla-lang-lottery/lang-inherit-cl-empty-http

[5] 
http://malform.no/testing/html5/attr-lang/mozilla-lang-lottery/lang-inherit-cl-empty-http-cancel

[6] 
http://malform.no/testing/html5/attr-lang/mozilla-lang-lottery/lang-inherit-cl-empty-http-cancel-multiple

[7] 
http://malform.no/testing/html5/attr-lang/mozilla-lang-lottery/kwc-lang

[8] 
http://malform.no/testing/html5/attr-lang/mozilla-lang-lottery/kwc-cl

-- 
leif halvard silli

Received on Friday, 9 April 2010 08:24:25 UTC