RE: ISSUE-88 / Re: what's the language of a document ?

Leif Halvard Silli, Fri, 12 Mar 2010 22:43:47 +0100, replying Addison:
  [...]
>> Ideally *all* documents will populate the root element 
>> with an appropriate @lang attribute (although, please note, that 
>> there exist cases in which an empty attribute *is* the appropriate 
>> value).
> 
> There is a difference between an empty attribute and no attribute. But 
> only in XML: [1]
  [...]
> But not in HTML.  
  [...]
> Then at least Firefox and Safari will treat the element as if nothing 
> has been declared, and thus apply the language inside <meta> C-L as a 
> fallback solution. I have Live DOM Viewer test that you can test for 
> this. [2] Whereas Internet Explorer 8, will use the XML behaviour.

I'm very glad you mentioned the issue of empty lang="" ... It is a very 
significant and bug related use case for this issue! 

Let's look at Firefox: The only way to make sure that Firefox 
interprets an empty @lang in the root element in a manner which is 
similar to how an empty lang is interpreted in XML, is to *also* 
provide a single, *white-space filled* <meta> content-language element: 
The presence of that <meta> content-language element causes the user 
agent to not listen to the content-language header coming from the 
server.

A whitespace filled <meta> content-language element validates as XHTML 
in the W3 Validator. But does not validate, currently, in HTML5.

Also note that an empty lang="" *is* interpreted in the XML way even in 
Firefox *provided* that there isn't any content-language headers 
(whether from server or in the document) with one (or more) actual 
language tags inside. If you manage to remove the content-language 
header, or if you manage to silence it the way I described above (with 
an white-space filled <meta> content-language element), then an empty 
lang="" attribute *will* have the effect that the element doesn't 
inherit the language from its parent element. (But if you do not 
silence the content-language header like this, then the element will 
associate itself with *one* of the languages of the content-language 
header.)

So, I would like to extend my not yet written change proposal to say 
that a white-space filled <meta> content-language element should be 
valid - like it is in XHTML. For Firefox, it is enough with just a 
single such element - that will silence the effect of the 
content-language header. (This was totally new to me ... On the 
surface, Safari has the same issue, but a more thorough look shows that 
Safari fails to respect the semantics of an empty lang="" regardless of 
whether there is a relevant content-language header or not.)

(And clearly we should file bugs for Webkit and Mozilla w.r.t. how they 
behave w.r.t. empty lang="".)

	Finally: 

I think *both* Ian and the I18N group make the mistake of treating the 
<meta> element differently from the HTTP header.  

W.r.t. to the I18N group: Take the suggestion that the order of the 
language tags inside the <meta> content-language element should be 
significant. So what if there is no <meta> content-language element? 
But instead, there are several content-languages specified on the 
server? Do you want to regulate the order of the language tags coming 
from the server as well? (The tests that Richard has made shows that 
Firefox and IE listens to both - which is in line with HTML4.) I think 
that *only* if you really and truly edit the HTTP specification to say 
that the order matters, can we implement something like that for the 
<meta> content-language element.

W.r.t. Ian: What I explained about the Firefox behaviour above, shows 
that restricting the <meta> content-language element to only one 
language tag fails to solve all the interoperability problems that this 
element has. And also: Currently Validator.nu asks authors to remove 
the <meta> content-language element and use @lang instead (even when it 
contains only one language tag). However, this advice only fools 
authors to think that they can get rid of the effect of the 
content-language header by removing the <meta> content-language 
element. Which they can't! Unless they in the same go also removes the 
content-language headers that the server sends out.
--  
leif halvard silli

Received on Saturday, 13 March 2010 05:02:27 UTC