Re: Null change proposal for ISSUE-88 (mark II) from Leif Halvard Silli on 2010-04-04 (public-html@w3.org from April 2010)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Sun, 4 Apr 2010 05:16:05 +0200
To: Maciej Stachowiak <mjs@apple.com>
Cc: Ian Hickson <ian@hixie.ch>, public-html@w3.org
Message-ID: <20100404051605252752.14cc4b24@xn--mlform-iua.no>
Maciej Stachowiak, Sat, 03 Apr 2010 17:51:39 -0700:
> On Apr 3, 2010, at 5:34 PM, Ian Hickson wrote:
>> On Sat, 3 Apr 2010, Leif Halvard Silli wrote:
>>> Ian Hickson, Fri, 2 Apr 2010 18:54:23 +0000 (UTC):

>>>  […]
>>>> Even if there was such a need, this feature would be a bad way to provide
>>>> that information, since it is used in an incompatible way by user agents
>>>> (the first language, and only the first language, is used to determine
>>>> processing behaviour -- none of the languages are treated as a target
>>>> audience language hint).
>>> 
>>> Some incorrectness. Se note above.
>> 
>> Indeed. I should have said that it was a bad way to provide the
>> information since it causes user agents other than Mozilla to ignore the
>> information altogether.
> 
> I'd still like to see a test case, so people can check this for 
> themselves. Given Leif's information, here's my take (personal 
> opinion only):

(Done: http://lists.w3.org/Archives/Public/public-html/2010Apr/0101 )
 
> I think the processing requirements should be updated to match 
> Mozilla (so implementations are permissive in what they accept). 

This is something I perhaps could accept. But it depends on 2-3 other 
issues:

1) Spec must say that it is the *last* <meta> c-l that counts. 
   (Currently spec asks all UAs to change to prioritize the first.)
2) It must be permitted to use empty space inside <meta> c-l (like in 
HTML4).
   Empty space is currently the only thing that *cancels* the effect of
   <meta> content-langauge *and* the content-language that comes from
   the server.
   (Please note that Mozilla also looks at the server!)

However, I personally think that it would be much better if Mozilla 
changed! Even if Mozilla changed, I still think that 1) and 2) are 
necessary. I think it is good and ideal if user agents do not use 
<meta> content-language (or the server's content-language header) for 
language fallback whenever it contains more than one language. (Here I 
disagree with the I18N wg, apparently. Also, my own opinion has 
changed.) Authors should instead use <html lang="*">. And also, I think 
it could "break the web" to suddenly start to honor all language tags 
inside <meta> c-l.

If we really do change anything deep about what UAs do, then we should 
require them to give priority to what the server says over what the 
<meta> c-l says - as this is the logical order, and the order which 
encoding is determined, e.g. Validators could then check whether the 
<meta> c-l says the same as the server says - that is also, I believe, 
what Validator.nu does when the <meta> charset differs from the HTTP 
served encoding header. If the <meta> charset differs, then author 
should update it to say the same as the server - or simply remove it. 
(Or provide a white-space filled <meta> c-l, as this cancels unwanted 
fallback effects in legacy browsers.)

> But 
> the authoring requirements should allow only a single value, to 
> maintain compatibility with legacy UAs (since a comma would cause 
> non-Mozilla pre-HTML5 browsers to ignore the language information 
> entirely).

This is something I find harder to accept. May be if you said that 
authors were required to provide a second (a last) <meta> c-l element 
which cancels the effect (by containing whitespace). Validators should 
then not look at anything other than the last <meta> content-language 
element. 

First of all: Remember that we talk about a fallback feature - a side 
effect of the <meta> content-language element. 

Secondly: The most important thing is that <element lang="*"> works as 
it should. (And Webkit has many bugs there!) 

Thirdly: Personally, I see it as (*at least currently*) as a *benefit* 
that user agents do *not* use the <meta> content-language 
element/header for fallback whenever it contains more than one 
language. And the reason I want permission to use white-space inside 
the <meta> content-language element, is to *cancel* the language 
fallback effect in all browsers, including Mozilla. (Whitespace is the 
only thing which cancels it in Mozilla.) After all, authors should use 
<html lang="*"> rather than <meta> content-language.)

It is important to understand the relationship between @lang and <meta> 
c-l. If you do <element lang="">, then the language of this element 
should be set to unknown. But many user agents in this case currently 
go looking for what the <meta> c-l says. This is a problem. I can 
provide use cases for what mean. Especially if you express that you are 
interested. ;-) 

We really should not try to make <meta> c-l useful. We should instead 
try to make it less useful - for language fallback. Its use should 
instead be in the domain that HTTP has defined for it. Clearly <meta> 
c-l is much less useful inside a document than <meta> charset is.

[What I said above was probably not clear enough. Please ask for 
clarifications.]
-- 
leif halvard silli
Received on Sunday, 4 April 2010 03:16:39 UTC