W3C home > Mailing lists > Public > www-international@w3.org > April to June 2010

RE: Regarding update of language declaration tests (I81NWG)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Tue, 20 Apr 2010 21:54:28 +0200
To: Richard Ishida <ishida@w3.org>
Cc: www-international@w3.org
Message-ID: <20100420215428174965.9e104628@xn--mlform-iua.no>
Richard Ishida, Tue, 20 Apr 2010 07:17:36 +0100:
>> Leif Halvard Silli 19 April 2010 18:01

   http://www.w3.org/International/questions/qa-no-language

>> ]]
>> On the very rare occasion when the whole document is in an undefined
>> language it is better to just not declare the default language of the
>> document.
>> [[
>> 
>> However, this advice does not help the slightest, if the user agent is
>> inheriting a language from the Content-Language HTTP header or the
>> HTTP-EQUIV meta element.
  ...
> It's hard for me to see why, in those rare circumstances, you'd have 
> conflicting language information in the http header or meta element, 
> but it you did, there'd be no reason not to use lang="" to override 
> their effect.

The focus on "rare circumstances" doesn't catch the issue.

Authors will hardly always be in control - or in the know - about how 
their pages will be served. Most authors don't touch the server 
configuration and things like .htaccess files. I believe that the 
number of authors which need or want to be in control of the language 
declaration, is higher than those that need to un-declare the entire 
document.

>  (The basic rule is actually stated as " you should 
> only tag text as undetermined if you can't just leave it as is.")

Meaning: if you are not certain about whether you have control, but 
need to have control, then you should tag it as undetermined? 

The QA article doesn't talk about http-equiv at all (a mayor short 
coming). But why not rather recommend authors to, in that case, use the 
"und" tag inside the META tag? That way, one could be certain that the 
nothing outside the HTML code inflicts on the language of the document.

<meta http-equiv="Content-Language" content="und" />

This would satisfy as a solution in accordance with my description:

>> for that reason, there should be a way - other than not using 
>> Content-Language (on the server side) - for making sure that the user 
>> agent does not inherit the language from Content-Language.

> So I think that if browsers just implement support for lang="" we 
> have no issue here. 

As long as HTML5 will not disallow http-equiv="Content-Language" 
completely, then yes, I think I could live with declaring the 
http-equiv="Content-Language" as "und", yes. (Great point, CE!) Should 
work both in current and future clients. 

But if, as you have been toying with, HTML5 removes 
http-equiv="Content-Language", then it is hard for me to see how one 
can claim there is no issue.

> PS: Note that the article referred to at 
> http://www.w3.org/International/questions/qa-no-language needs 
> updating to take into account the latest developments in this area.

I think the article needs to be updated about 7 things, of which only 
the things related to HTML5 represent a "latest development":

1) HTML5 empty string {new issue}
2) HTML5 empty string vs legacy UA support {new issue}
3) XHTML empty string vs UA support {old issue}

And when it comes to the QA article's advice "to just not declare the 
default language of the document", then the it it should mention the 
(possible) unwanted language fallback effect of Content-Language

4) in legacy XHTML user agents  {old issue}
5) in legacy (aka HTML4) user agents {old issue}
6) in HTML5 UAs {new issue}

7) The possibility of using
   <meta http-equiv="Content-Language" content="und" />

Btw, some data: 

Based on Opera's MAMA, then 9,11% of occurrences of the very (<html>) 
element on the Web have a @lang. [1] But 1.55% pages do not include 
<html> in the code [2], thus only 8% of pages have a @lang on the root 
element. All these pages will be affected either HTTP or HTTP-EQUIV 
Content-Language - now and in the future.

For http-equiv="Content-Langauge" then 13% of Web pages use it (456078 
[3] divided on 3503482 [1]). While 1.75% (61240 [4] divided on 3503482 
[1]) use the "real" Content-Language http header. Thus, up to 15 
percent of all web pages may use either http-equiv="Content-Language" 
or http Content-Language (some pages probably use both http-equiv and 
http.) A higher percentage than the number of pages using the lang="" 
attribute.

This, in my view, speaks against making http-equiv="Content-Language" 
illegal in HTML5 documents.

And, also, while 1.75% of Web pages using the "real" HTTP 
Content-Language header sounds little, the number of Web pages is both 
very wide and big. E.g. many html elements are used far less often than 
the HTTP Content-Language header occurs. (For example the address 
element is used less than the real HTTP Content-Language [2].) Thus, 
1.75% is a real use case - it should not be ignored.

[1] http://dev.opera.com/articles/view/mama-common-attributes/#lang
[2] http://devfiles.myopera.com/articles/532/elemlist-url.htm
[3] http://devfiles.myopera.com/articles/575/metahttpequivlist-url.htm
[4] 
http://devfiles.myopera.com/articles/554/httpheaders-contentlang-url.htm
-- 
leif halvard silli
Received on Tuesday, 20 April 2010 19:55:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 20 April 2010 19:55:04 GMT