W3C home > Mailing lists > Public > www-international@w3.org > April to June 2010

RE: Regarding update of language declaration tests (I81NWG)

From: CE Whitehead <cewcathar@hotmail.com>
Date: Wed, 21 Apr 2010 21:13:02 -0400
Message-ID: <SNT142-w338401B90AA0E16AC799E6B3080@phx.gbl>
To: <xn--mlform-iua@xn--mlform-iua.no>, <ishida@w3.org>
CC: <www-international@w3.org>

> Date: Tue, 20 Apr 2010 21:54:28 +0200
> From: xn--mlform-iua@xn--mlform-iua.no
> To: ishida@w3.org
> CC: www-international@w3.org
> Subject: RE: Regarding update of language declaration tests (I81NWG)
> Richard Ishida, Tue, 20 Apr 2010 07:17:36 +0100:
> >> Leif Halvard Silli 19 April 2010 18:01
> http://www.w3.org/International/questions/qa-no-language
> >> ]]
> >> On the very rare occasion when the whole document is in an undefined
> >> language it is better to just not declare the default language of the
> >> document.
> >> [[
> >> 
> >> However, this advice does not help the slightest, if the user agent is
> >> inheriting a language from the Content-Language HTTP header or the
> >> HTTP-EQUIV meta element.
> ...
> > It's hard for me to see why, in those rare circumstances, you'd have 
> > conflicting language information in the http header or meta element, 
> > but it you did, there'd be no reason not to use lang="" to override 
> > their effect.
> The focus on "rare circumstances" doesn't catch the issue.
> Authors will hardly always be in control - or in the know - about how 
> their pages will be served. Most authors don't touch the server 
> configuration and things like .htaccess files. I believe that the 
> number of authors which need or want to be in control of the language 
> declaration, is higher than those that need to un-declare the entire 
> document.
Agreed.  But both are useful things to be able to do.
> > (The basic rule is actually stated as " you should 
> > only tag text as undetermined if you can't just leave it as is.")
> Meaning: if you are not certain about whether you have control, but 
> need to have control, then you should tag it as undetermined? 
> The QA article doesn't talk about http-equiv at all (a mayor short 
> coming). But why not rather recommend authors to, in that case, use the 
> "und" tag inside the META tag? That way, one could be certain that the 
> nothing outside the HTML code inflicts on the language of the document.
> <meta http-equiv="Content-Language" content="und" />
> This would satisfy as a solution in accordance with my description:
> >> for that reason, there should be a way - other than not using 
> >> Content-Language (on the server side) - for making sure that the user 
> >> agent does not inherit the language from Content-Language.
> > So I think that if browsers just implement support for lang="" we 
> > have no issue here. 
> As long as HTML5 will not disallow http-equiv="Content-Language" 
> completely, then yes, I think I could live with declaring the 
> http-equiv="Content-Language" as "und", yes. (Great point, CE!) Should 
> work both in current and future clients. 
Thanks.  Glad you can live with it.
> But if, as you have been toying with, HTML5 removes 
> http-equiv="Content-Language", then it is hard for me to see how one 
> can claim there is no issue.
> > PS: Note that the article referred to at 
> > http://www.w3.org/International/questions/qa-no-language needs 
> > updating to take into account the latest developments in this area.
> I think the article needs to be updated about 7 things, of which only 
> the things related to HTML5 represent a "latest development":
> 1) HTML5 empty string {new issue}
> 2) HTML5 empty string vs legacy UA support {new issue}
> 3) XHTML empty string vs UA support {old issue}
> And when it comes to the QA article's advice "to just not declare the 
> default language of the document", then the it it should mention the 
> (possible) unwanted language fallback effect of Content-Language
> 4) in legacy XHTML user agents {old issue}
> 5) in legacy (aka HTML4) user agents {old issue}
> 6) in HTML5 UAs {new issue}
> 7) The possibility of using
> <meta http-equiv="Content-Language" content="und" />
> Btw, some data: 
> Based on Opera's MAMA, then 9,11% of occurrences of the very (<html>) 
> element on the Web have a @lang. [1] But 1.55% pages do not include 
> <html> in the code [2], thus only 8% of pages have a @lang on the root 
> element. All these pages will be affected either HTTP or HTTP-EQUIV 
> Content-Language - now and in the future.
It's the meta http-equiv that many applications insert.
> For http-equiv="Content-Langauge" then 13% of Web pages use it (456078 
> [3] divided on 3503482 [1]). While 1.75% (61240 [4] divided on 3503482 
> [1]) use the "real" Content-Language http header. Thus, up to 15 
> percent of all web pages may use either http-equiv="Content-Language" 
> or http Content-Language (some pages probably use both http-equiv and 
> http.) A higher percentage than the number of pages using the lang="" 
> attribute.
I think that this is changing, however gradually.
But I agree with Leif that the meta http-equiv should not go -- I like to have at my disposal as much as possible all three ways of declaring the document-wide language
and hope that these three ways -- the http header, the meta http-equiv element, and the html element -- will ultimately be used more as originally intended but that in the meantime we should both allow for their use as originally intended for legacy in any case, and also for their use as they are used; when there is a conflict between how they are intended and how they are actually used, a note on this in a recommendation will serve the purpose; for any problems that the declarations available for these elements might not solve -- and for cases of conflicting language information, again a note on this in a recommendation will solve things somewhat.
C. E. Whitehead

Received on Thursday, 22 April 2010 01:14:09 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:58 UTC