W3C home > Mailing lists > Public > public-html@w3.org > March 2010

(unknown charset) RE: ISSUE-88 / Re: what's the language of a document ?

From: (unknown charset) Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Fri, 12 Mar 2010 23:20:45 +0100
To: (unknown charset) CE Whitehead <cewcathar@hotmail.com>
Cc: (unknown charset) addison@amazon.com, www-international@w3.org, public-html@w3.org, ishida@w3.org, ian@hixie.ch
Message-ID: <20100312232045786042.2357c6d2@xn--mlform-iua.no>
CE Whitehead, Fri, 12 Mar 2010 15:18:01 -0500 - in reply to Addison:
   ....
>> I think it would be appropriate to include text allowing the 
>> user-agent to infer the value of @lang from a properly 
>> formed <meta> tag in cases in which @lang is not present or empty. 
>> In that case, I would expect that the first
>> language in the <meta> tag would be assigned as an inferred value 
   [ ... ]
>>  Additional language tags can be trimmed off using a 
>> well-placed call to strtok (for example). This is what our working 
>> group's Change Proposal proposes.
>
> I thought this was Leif's proposal; perhaps I am mistaken here; but I 
> thought Leif meant to read the first value of the http header to
> infer the text-processing language when none was declared in
> the xml or html element, and to otherwise consider two values o.k.
> Thanks.

I think it was clear that I offered an alternative proposal to the one 
from the I18N WG. ;-) And I believe that it was clear that this exact 
point were my proposal differs from the I18N proposal ... 

Let us argue based on the algorithm in HTML4:

> 8.1.2 Inheritance of language codes
> An element inherits language code information according to the 
> following order of precedence (highest to lowest):
>  * The lang attribute set for the element itself.
>  * The closest parent element that has the lang attribute set (i.e., 
>    the lang attribute is inherited).
>  * The HTTP "Content-Language" header (which may be configured in a 
>    server). For example: Content-Language: en-cockney
>  * User agent default values and user preferences.

The second last step - the HTTP content-langauge header option - only 
works whenever it represents just one language. And since User Agents 
doesn't implement the last step ("User agent default values and user 
preferences"), and since multiple language tags inside the <meta> C-L 
means that no language can be inferred (without a change to the above 
algorithm - such as looking for the first language tag), we have - in 
my view - a problem to solve.

Really, Validator.nu should not only check the content of the <meta> 
content-language element, but should also check the content-language 
header coming from the server. And, whenever the HTTP content-language 
from the server offers more than one language, then the validator 
should warn/recommend that root element should have a lang attribute, 
since otherwise, there is no way to obtain the language of the document.

Even if we specify - now - that it is the first language tag that 
counts: Existing content out there doesn't know anything about this. 
Some content could also rely on the current behaviour w.r.t. how user 
agents interpret meta elements with multiple values. And I am not even 
sure that it is any good to give any special heed to the first language 
tag of the Content-Header - if that means ignoring the rest as possible 
fallback options. 
-- 
leif halvard silli
Received on Friday, 12 March 2010 22:21:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:05 GMT