RE: ISSUE-88 / Re: what's the language of a document ?

Ian Hickson, Mon, 22 Feb 2010 06:08:29 +0000 (UTC):
> On Mon, 22 Feb 2010, Phillips, Addison wrote:

>> This pragma can 
>> contain a list of languages. One of these might be inferred to be the 
>> primary (outer) document processing language if the 'lang' attribute is 
>> missing. And that, in a nutshell, is what I think we're wrestling with 
>> here: whether the pragma should be wired up to 'lang' in that case, and, 
>> if it has more than one language, which language should be applied.

Exactly. But applied by whom or what? By CSS? It seems like both the 
HTMLwg and the I18Nwg look far too much to CSS:

If the user agent is a speech based screen reader, then I do think that 
we all agree that the user (agent) should be able to override the 
language tagging whenever it is wrong - e.g. if the page says lang="no" 
when it should say lang="ru". (See concrete example of such tagging: 
www.gost.ru ...)  Such user (agent) overriding is also permitted in 
HTML4: [1]

]] An element inherits language code information according to
   the following order of precedence (highest to lowest):
     [ ... snipped to the bottom of the list ... ]
   * User agent default values and user preferences. [[

But for some - unwritten - reason, we seem to discern between what we 
accept from CSS and what we accept from a screen reader: We don't 
accept that the user agent or the users preferences (re)defines the 
document language when it comes to CSS. 

To exemplify: Richard/The I18Nwg have some tests of language 
declarations w.r.t CSS, but none of these test tests for user's ability 
to override the specified language! [2]

Also, when it comes to CSS - it seems as if is typical to expect that 
<meta> element wins over server's header. Again, a example of this can 
be seen in  Richard/I18Nwg's tests [3]. 

Why is that? It is always the server that should be most authoritative. 
Or is there some special rule for CSS?
 
> The spec's definition of the Content-Language pragma is specified as it is 
> because that's what user agents do with that pragma. Making it do 
> something else would require changing user agent implementations.

User Agents must change regardless: HTML4 refers to "content-language" 
as one an the same thing, whether it is defined by server or page: [1]

  ]] The HTTP "Content-Language" header (which may be configured
     in a server). [[

And Richard's tests show that Firefox (+ most likely all Mozilla 
browsers) do see them as one and the same feature. [3] 

> [...] having it specify multiple languages wouldn't 
> work well with CSS or speech synthesisers, for instance. [...]

If so, then it already is an issue w.r.t. Mozilla web browsers. The 
problem you take up here really belongs to a document about "what to do 
when @lang is lacking or is incorrect?"

I think part of the solution to ISSUE-88 is to realize that 
 (A) User (agents) should have the option override the language tags 
     - this option should not be reserved to search engines ...
 (B) overriding also applies to CSS (unless we spec that it don't)

If we want correct use of content-lanugae, then it looks to me as if it 
would be better if there were no automatic link between 
content-language and @lang. Or, in a way: we could place 
content-language into the lowest precedence box in HMTL4 -  "User agent 
default values and user preferences" [1] - and specify more detailed 
rules for how user agents should interact with the user and with the 
document, when @lang is lacking.

Which leads me to my final point: One way to solve this issue could be 
to lift everything that has to do with content-language and user 
(agent) preferences and language detection /(when @lang is wrong or 
lacking) out of HTML5 proper and into a separate specification.

[1] http://www.w3.org/TR/html401/struct/dirlang#h-8.1.2
[2] 
http://www.w3.org/International/tests/results/results-lang-declaration
[3] 
http://www.w3.org/International/tests/tests-html-css/tests-language-declarations/generatehtml?test=11
-- 
leif halvard sili

Received on Tuesday, 23 February 2010 05:51:27 UTC