RE: ISSUE-88 / Re: what's the language of a document ?

CE Whitehead, Fri, 19 Mar 2010 14:17:59 -0400:

> Hi, Leif, all.
> 
> I agree with Leif that, for handling of multiple meta elements, 
> the w3c can retroactively align its standards to practice,
> by making the last meta element the one that is valid, 
> in the case that multiple are specified 
> (I believe that this is more in line with the w3c's  standards for 
> style code anyway;  the last style declaration that applies to an element 
is the 
> one that is processed; all others are ignored, right?).

CSS style rules are a good parallel. But even language inheritance 
itself: it starts with the inner element, and moves to the outer 
element. And from the perspective of let's say a <p> element, then a 
HTTP-EQUIV <meta> content-language element comes after the <html> 
element.

I think that Ian, even though he claims to treat the <meta> 
content-language not as an element which imitates the HTTP header 
anymore, still treats it as such when he insists that user agents 
should look at the *first* <meta> content-language element.

> * * * Below is some disagreement sorry for it maybe I am wrong * * *
> 
> My two cents on having two meta-elements:  if one of these is 
> omitted, which processes will the remaining element be used for? 
> (This has to be specified.)

I will see if I can make it clearer. 

However, if there is only one <meta> content-language element, then 
this element is both the first and the last, at once. ;-) Thus user 
agents will use it for setting the language. But web servers will also 
use the same element. If there are two, then web servers should use the 
first, while user agents use the last.

> Also, we still have the html lang= and xml lang= elements/attributes 
> in any case!

Yes.

> So you suggest the html and xml lang attributes plus two meta 
> elements plus an http header?

Always use (xml:)lang="". But whether - and how - you should use the 
<meta> content-language attribute depends.

First: decide if you need to use the HTTP content-language header at 
all. If we assume that you should, then the question what means to use.

The first reason to not use the <meta> c-l element for this mean, is 
that most of us do not have access to web servers/CMS-es that actually 
make any use of the content-language element. And hence, there is no 
technical goodness in it for most of us. So then why use it? It is 
better to use a method that actually works. Apache can easily be 
configured send out the content-language header. Apache doesn't make 
use of <meta> c-l for this functionality. 

So, in the usual scenario, then <meta> content-language is not 
necessary to use. In these cases users should either not use it at all, 
or they should use a *single* white-space filled <meta> c-l element for 
the purpose of cancelling the unwanted language fallback effect in 
Mozilla and Webkit/Konqueror/Chrome. The latter option (a single 
white-space filled <meta c-l) is my view the most optimal use in most 
situations. (Of course, cancelling the language fallback effect is only 
meaningful if you also use @lang.)

If authors decide to use it as originally intended (with one or several 
language tags inside), and if they also only want to use only a single 
element, then authors are better off if they can find a reason to 
validly fill it with more than one language tag, because then the 
language fallback effect already cancels itself in all browsers except 
Mozilla.  I will not, in these cases, *require* authors to also use a 
second white-space filled <meta> c-l element. But authors should be 
aware that this is the only way to cancel the effect in Mozilla 
browsers as well.

> Also, as the meta is only a fallback for when those are not specified 
> I am not sure we need two anyway

We cannot use "we" about this. What one needs to do depends on the 
level of control that one needs to have. If your web server sends out a 
content-language header (which is not unlikely that it does), then both 
Mozilla browsers and IE8 will use that header as fallback language. Of 
these two, only Mozilla has the language inheritance problem. And if 
that problem is important to you to solve, then the only way to get rid 
of it, is to use a last (which could also be a single) <meta> 
content-language element with white-space inside.

> (I need convincing; to me this is a case where aligning w3c 
> specifications to the current practice --
> using the first language specified by http or meta content-language 
> to populate  the lang= attribute in the html or xml tag, as has been 
> discussed previously -- makes sense).

My main rationale is this: Given how messy this whole issue now is, 
then it seems to be very complicated to get user agents to actually 
move in the direction of making one of the languages the default one. I 
simply think it is too much to ask for. 

It seems better to focus on getting the processing language right first 
(by ensuring that it is possible for authors to legally cancel the 
effect of the problematic language fallback story of the <meta> 
content-langauge element). Rather than complicating the issue with 
requests about making <meta> content-language containing several 
languages do things that it was not meant to do.

> Finally, will someone who ignores the html or xml lang =  
> successfully use the two meta elements? Keeping data in the right 
> order?

Firstly: There are no common user agents that ignores the lang= 
attribute, I think. But there are a few (Mozilla, 
Webkit/Konqueror/Chrome) which fail to treat an *empty* lang (<html 
lang="">) according to how HTML5 wants it to be.

Secondly: Since I discuss a problem which is related to the situation 
*when* the author uses the lang="" attribute, your question is not 
really related to the issue at hand. But I will answer you anyhow: 

1) If the document doesn't contain a single lang attribute, but still 
contains two <meta> content-language elements, where the last one 
contains white-space, then user agents would not receive language 
information from anywhere - they would not have any clue about the 
language. The same "problem" would also arise if <meta> 
content-langauge contains more than one language. (However, since 
<meta> c-l is not meant to define the processing language, this can't 
really be seen as a problem.)

2) But, if the last <meta> content-language element of this 
hypothetical document *does* contain a single language code, then all 
browsers that actually make use of the <meta> content-language element, 
would pick it up and use it as the language of the document. (I have 
tested IE8, Firefox, Webkit/Konqueror/Chrome. None of the Opera 
versions I tested made any use of <meta> content-language.)

> My personal opinion is that they (he, she, whoever) can just as well 
> learn to use the html / xml lang attributes as they (he, she, 
> whoever) can learn to insert an additional meta element.

There is no "just as well". The fact of the mater is that we have four 
web browsers that fail to respect the semantics of an empty lang="" 
attribute as soon as a <meta> c-l element comest into the picture. In 
particular Mozilla. If you say <p lang="">, then Mozilla will respect 
this and treat it as an element for which the language is unknown. 
*Except* when there is a Content-Language coming from the last <meta> 
c-l element *or* (when there is no <meta> element) coming from the 
server. Thus, as you can see, if we want to solve the Mozilla problem, 
then we must make sure that  last (or the single) <meta> c-l element is 
white-space filled. (The <meta> is thus both be the problem and the 
solution ...)

And again: one or two <meta> c-l elements? This depends. See above.

> (I am not always for aligning w3c specifications to current practice: 
> I still want a way to specify two document or audience languages 

Sorry, I forgot: did you by "document or audience languages" mean "text 
processing language and audience language"?

> where content is truly mixed, but not two meta elements.

Truly mixed? In a hierarchic tree structure like HTML, then there is no 
way to say that the content is "truly mixed". (Not until we get 
language tags that are able to express this, at least.) If you write 
<html lang="en"> then you say that the <html> element contains English. 
It is impossible to say that it contains English and French, for 
instance. And the <meta> content-language also doesn't say that your 
document contains English and French just because you write 
content="en, fr" inside it. My proposal doesn't take away or add any 
functionality in this regard.

> The http headers and the meta elements have been the designated 
> places for this.

If you can pinpoint a place in my change proposal were I change the 
semantics of @lang or the <meta> content-language, then I would 
immediately correct it.

> I need more explanation, I guess; but I don't think I would support 
> two meta content-language elements.  

Then you should go back to the I18N group and ask them to change their 
proposal. Their proposal do not forbid two <meta> content-langauge 
elements. HTML4 also doesn't forbid it. And not other HTML 
specification (including HTML5) that I am aware of. T

he only new thing in my proposal is that I suggest that we specify that 
the last meta element is the one that counts with regard to language 
inheritance. This is the opposite of what HTML5 currently says, but in 
line with how all browsers behave. This solution, together with the 
permission to use white-space inside it (as was permitted in 
HTML4/XHTML1) will BOTH solve the default language problem AND solve 
the language inheritance problem. [Of course, it cannot solve both 
problems at the same time - but it can solve the one of these two that 
the author in question is most concerned about solving. A full solution 
to the problem requires that Mozilla browsers, Chrome, Webkit and 
Konqueror solve some bugs.]
 [...]
-- 
Leif Halvard Silli

Received on Saturday, 20 March 2010 05:12:08 UTC