RE: ISSUE-88 / Re: what's the language of a document ?

CE Whitehead, Sat, 20 Mar 2010 19:31:21 -0400:

[ Snip. Reply to remaining part of letter: ]

> RE: ISSUE-88 / Re: what's the language of a document ?
> From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 
> Date: Sat, 20 Mar 2010 06:11:28 +0100
>> However, if there is only one <meta> content-language element, then 
>> this element is both the first and the last, at once. ;-) Thus user 
>> agents will use it for setting the language. But web servers will also 
>> use the same element. If there are two, then web servers should use the 
>> first while user agents use the last.
> But that's a "should."  And user agents should use the html lang= or 
> xml lang= too, right? And if they did we would not need two meta 
> content-language elements, right?

There are some XHTML document types which forbids the @lang attribute. 
When you serve these document types as 'text/html', then all language 
info is lost, as xml:lang="<whatever>" is not respected in 'text/html'. 
For such documents, using <meta> content-language enables you to at 
least define *one* language (for all elements) in a user agent 
compatible way.

But otherwise, the current text in HTML5 basically says basically the 
same thing as you: We don't need to use <meta> content-language - use 
lang="<*>" instead. This is true, in theory, but not in practise. Well, 
it is true in practise as well. Except for the particular use case when 
you explicitly want to set the language of an element to unknown (by 
providing an empty lang attribute. (This use case is a result of the 
fact that HTML5 aligns the meaning of an empty lang="" to the 
same as the meaning of an empty xml:lang="".) For this particular use
case, it is necessary to make sure that the <meta> content-language 
element says, in a user agent compatible way, that the language is 
unknown.

[...]
> Still I ask:  why not simply ask the browsers to respect the html 
> lang="" or xml lang="" declaration if they do not?

HTML5 does ask that they respect an empty lang="" in HTML (or an empty 
xml:lang in XHTML). Neither my change proposal nor the I18N WG's 
proposal conflict with this.

> Would the browsers be more inclined to process a second meta 
> content-lang element set to lang="" 
> than to respect the xml lang="" or html lang=""?
> That is my real question for you.

(The attribute of the <meta> content-language element which contains 
the language tag(s) isn't called 'lang', it is called 'content'.)

Assuming that you meant 'empty lang=""' (and not 'any lang="", empty or 
not'), then the answer to your question is that the problematic user 
agents (Mozilla family + Konqueror/Webkit/Chrome family) respect an  
empty <meta> content-language. Whereas the same two browser families, 
without going into the (important!) details (again), don't (always) 
respect an empty lang="". 

I don't have any particular to say about xml:lang="" - as I have only 
tested 'text/html' (where it has no effect).

  [...]
> Yes, so you are saying that specifying multiple languages at this point
> is equivalent to specifying lang=""

Richard has described the meaning of an empty xml:lang="" like this: [1]

	]]XML also provides a means to prevent inheritance of language using 
the empty string, ie. xml:lang="". Essentially, this says: I do not 
want to associate any language with this information.[[

HTML5 changes, I believe, an empty lang="" to have the same meaning. I 
don't know if you, by the wording "specifying multiple languages", 
meant the same as Richard. From one angle it can certainly be correct 
to say that a truly multilingual document should not associate any 
particular language with itself.

[... some multiple language issues ...]
> (I will send you a sample if you wish  -- in private email; I see no 
> reason to clutter up the list.)

Please do.

[1] http://www.w3.org/International/articles/language-tags/#overview
-- 
leif halvard silli

Received on Sunday, 21 March 2010 16:28:29 UTC