- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Tue, 20 Apr 2010 21:54:28 +0200
- To: Richard Ishida <ishida@w3.org>
- Cc: www-international@w3.org
Richard Ishida, Tue, 20 Apr 2010 07:17:36 +0100: >> Leif Halvard Silli 19 April 2010 18:01 http://www.w3.org/International/questions/qa-no-language >> ]] >> On the very rare occasion when the whole document is in an undefined >> language it is better to just not declare the default language of the >> document. >> [[ >> >> However, this advice does not help the slightest, if the user agent is >> inheriting a language from the Content-Language HTTP header or the >> HTTP-EQUIV meta element. ... > It's hard for me to see why, in those rare circumstances, you'd have > conflicting language information in the http header or meta element, > but it you did, there'd be no reason not to use lang="" to override > their effect. The focus on "rare circumstances" doesn't catch the issue. Authors will hardly always be in control - or in the know - about how their pages will be served. Most authors don't touch the server configuration and things like .htaccess files. I believe that the number of authors which need or want to be in control of the language declaration, is higher than those that need to un-declare the entire document. > (The basic rule is actually stated as " you should > only tag text as undetermined if you can't just leave it as is.") Meaning: if you are not certain about whether you have control, but need to have control, then you should tag it as undetermined? The QA article doesn't talk about http-equiv at all (a mayor short coming). But why not rather recommend authors to, in that case, use the "und" tag inside the META tag? That way, one could be certain that the nothing outside the HTML code inflicts on the language of the document. <meta http-equiv="Content-Language" content="und" /> This would satisfy as a solution in accordance with my description: >> for that reason, there should be a way - other than not using >> Content-Language (on the server side) - for making sure that the user >> agent does not inherit the language from Content-Language. > So I think that if browsers just implement support for lang="" we > have no issue here. As long as HTML5 will not disallow http-equiv="Content-Language" completely, then yes, I think I could live with declaring the http-equiv="Content-Language" as "und", yes. (Great point, CE!) Should work both in current and future clients. But if, as you have been toying with, HTML5 removes http-equiv="Content-Language", then it is hard for me to see how one can claim there is no issue. > PS: Note that the article referred to at > http://www.w3.org/International/questions/qa-no-language needs > updating to take into account the latest developments in this area. I think the article needs to be updated about 7 things, of which only the things related to HTML5 represent a "latest development": 1) HTML5 empty string {new issue} 2) HTML5 empty string vs legacy UA support {new issue} 3) XHTML empty string vs UA support {old issue} And when it comes to the QA article's advice "to just not declare the default language of the document", then the it it should mention the (possible) unwanted language fallback effect of Content-Language 4) in legacy XHTML user agents {old issue} 5) in legacy (aka HTML4) user agents {old issue} 6) in HTML5 UAs {new issue} 7) The possibility of using <meta http-equiv="Content-Language" content="und" /> Btw, some data: Based on Opera's MAMA, then 9,11% of occurrences of the very (<html>) element on the Web have a @lang. [1] But 1.55% pages do not include <html> in the code [2], thus only 8% of pages have a @lang on the root element. All these pages will be affected either HTTP or HTTP-EQUIV Content-Language - now and in the future. For http-equiv="Content-Langauge" then 13% of Web pages use it (456078 [3] divided on 3503482 [1]). While 1.75% (61240 [4] divided on 3503482 [1]) use the "real" Content-Language http header. Thus, up to 15 percent of all web pages may use either http-equiv="Content-Language" or http Content-Language (some pages probably use both http-equiv and http.) A higher percentage than the number of pages using the lang="" attribute. This, in my view, speaks against making http-equiv="Content-Language" illegal in HTML5 documents. And, also, while 1.75% of Web pages using the "real" HTTP Content-Language header sounds little, the number of Web pages is both very wide and big. E.g. many html elements are used far less often than the HTTP Content-Language header occurs. (For example the address element is used less than the real HTTP Content-Language [2].) Thus, 1.75% is a real use case - it should not be ignored. [1] http://dev.opera.com/articles/view/mama-common-attributes/#lang [2] http://devfiles.myopera.com/articles/532/elemlist-url.htm [3] http://devfiles.myopera.com/articles/575/metahttpequivlist-url.htm [4] http://devfiles.myopera.com/articles/554/httpheaders-contentlang-url.htm -- leif halvard silli
Received on Tuesday, 20 April 2010 19:55:03 UTC