Re: Null change proposal for ISSUE-88 (mark III) from Leif Halvard Silli on 2010-04-07 (public-html@w3.org from April 2010)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Wed, 7 Apr 2010 05:17:45 +0200
To: Ian Hickson <ian@hixie.ch>
Cc: public-html@w3.org
Message-ID: <20100407051745935676.641fb052@xn--mlform-iua.no>
Ian Hickson, Sun, 4 Apr 2010 01:01:53 +0000 (UTC):

(I'm sorry that I have to repeat myself to comment this proposal ...)

 ....
> As far as I am aware, no bug pointing to confusion on this subject and
> asking for clarification has been rejected, which makes using the change
> proposal process inappropriate.

Didn't get the rationale. Agree with the conclusion: one could delay 
the change prop process. There are bugs to solve.

> The same change proposal also suggests a second change, namely to change 
> the syntax to allow multiple comma-separated language codes, even though 
> all but the first would be ignored.

Actually, this is not what the spec says, is it? The spec says that 
every time the content attribute contains a comma separated list of 
language tags, then the content attribute contains a comma character, 
and would be ignored? Quoting the third step of the algorithm:  [1]

   ]]
3. If the element's content attribute contains a U+002C COMMA character 
(,) then abort these steps. 
   [[

So what do you mean by saying that "all but the first would be ignored"?

> User agents vary in their handling of the Content-Language pragma. Some 
> user agents support a comma-separated list as meaning (contrary to the 
> intent of the Content-Language HTTP header) that the root element and its 
> descendants, in the absence of any lang="" attribute, are in multiple 
> languages. This seems to contradict the model expected by the :lang 
> selector and by the lang="" attribute, which assume that each element has 
> a single language.

Yes, Mozilla UAs behave like this.

> Other user agents treat the comma as part of the language tag, for example 
> treating <meta http-equiv="Content-Language" content="en,fr"> as setting a 
> pragma-set default language of "en,fr", which can be matched by a selector 
> such as ":lang(en\,fr)", and specifically _not_ by ":lang(en)".

Yes, all non-Mozillas UAs behave more or less like this. 

> (The specification's UA conformance criteria propose a compromise model 
> wherein the user agents aren't required to support multiple languages per 
> element, but still interpret the comma correctly, rather than treating it 
> as part of the language code.)

I don't get this to match with "abort these steps" - see my spec quote 
above. But anyway, I have filed some bugs, whose message is a different 
compromise model:

  1) Comma separated list is allowed: content="en,fr"
  2) But UAs handle it like *:lang(en\,fr)

This is close to 100% like all user agents, except Mozilla, do it 
today. (The issues that Opera has with commas is not worth counting, 
because authors usually do no include illegal language tags in their 
CSS selectors. The other Opera issues has *could* be worth counting - 
see below.) Thus there should be very low risk in speccing it. And 
simple for vendors to implement. 

> Because of the way some legacy UAs handle this pragma, and because the 
> behaviour of conforming UAs drops all but the first language,

To use HTML5's definition of conforming, doesn't seem meaningful, since 
we can change it. No UAs behave in the "conforming" way, today.

> it would be 
> ill advised for us to make multiple values conforming. The way to mark 
> that a document _uses_ multiple languages in such a way that user agents 
> can actually parse and find this information is to use the lang="" 
> attribute in the document. Putting multiple values in the pragma would 
> fail to handle this according to the proposal.

I would turn this around: By allowing a comma separated list, but still 
demand that user agents treat it like they treat an illegal 
lang="en,fr" - namely as *:lang(en\,fr),  we make sure that it doesn't 
work as a (meaningful) language selector, thereby triggering authors to 
use lang="*" instead.

...
> IMPACT
> 
> POSITIVE EFFECTS
> * Encourages authoring behaviour compatible with both legacy user agents 
> and with conforming user agents.

To parse <META content-language content="en,fr"> the same way as <html 
lang="en,fr"> seems to me like a better way to encourage correct 
authoring behaviour. 

It must also be confusing to authors when they discover (if they 
understand it at all ...) that <meta http-equiv="content-language" 
content="en,fr"> as well as <meta http-equiv="content-language" 
content="<emptystring>" will cause user agents to go looking for what 
the server has to say (instead of having the same meaning as <html 
lang="en,fr"> and <html lang="<emptystring>"> ). See below for more 
explanation.

> * Flags uses of the pragma in existing documents that are not being 
> reliably processed in existing UAs.

More specifically: It flags a use of the pragma were *Mozilla* web 
browsers behave different from the rest. But there are important things 
which the current spec text doesn't make flagged:

   (1) Your reference section in the bottom of your e-mail points to a 
test which reveals that Opera can't even handle correctly a 
content-language declaration which only contains a single language tag! 
(It then treats <p> elements in one way, and <div> in another.) This is 
not flagged ... 

   (2) If the value of the content-language element is the empty 
string, then Mozilla looks for the preceding META element, or visit the 
HTTP header from the server. But still, the empty string is not 
flagged. Instead, the specced text *encourages* a behaviour which is 
qutie similar to the Mozilla behaviour: [1]

]] 
   Until the pragma is successfully processed, there is no pragma-set 
default language. [ ... snip ...]
2. If the meta element has no content attribute, or if that attribute's 
value is the empty string, then abort these steps. 
[[

Thus: empty string means that no pragma default language gets set. And 
then:  [2]

]]
   If there is no pragma-set default language set, then language 
information from a higher-level protocol (such as HTTP), if any, must 
be used as the final fallback language instead.
[[

See my test page for the empty string issue. [3] Getting user agents to 
look in the same spots for fallback info seems like a very useful step 
towards making them behave the same way. The simples way to do this is 
to make user agents treat content="*" like they treat lang="*". This 
should also be simple to agree about, as long as Mozilla is willing to 
change their behaviour, since the lang="*" behaviour is not contested.

> NEGATIVE EFFECTS
> * Flags uses of the pragma in existing documents that are harmless, such 
> as "en,en-US". However, evidence suggests that use of the comma is pretty 
> rare anyway:
>    http://lists.w3.org/Archives/Public/public-html/2010Apr/0088.html


If they are "pretty rare", then I think we should insist on treating 
content="en,fr" the same way as lang="en,fr" is treated. This will also 
ensure that UAs look no further than to the latest META 
content-language declaration, instead of causing them to - like Mozilla 
- go looking in the at the server.

> NEGATIVE EFFECTS
> * Flags uses of the pragma in existing documents that are harmless, such 
> as "en,en-US". However, evidence suggests that use of the comma is pretty 
> rare anyway:
>    http://lists.w3.org/Archives/Public/public-html/2010Apr/0088.html


There is a gap in that evidence: He did not count what the server says. 
E.g. if x-ua-compatible remains forbidden in HTML5, we may not see it 
often in pages (we must at least hope so). However, authors could still 
be configuring it on their web server. Likewise, just because 
content-language with multiple language tags aren't found in pages, 
doesn't mean that this isn't used inside servers.  I say so because 
Mozilla both looks at what the server says and also supports multiple 
values *and* are afraid of breaking the Web. [4] I could imagine that 
servers do use multiple languages, since it makes more sense on the 
server side to do this than it makes to do it in a document.

  ....
> REFERENCES
> Tests: http://www.hixie.ch/tests/adhoc/html/meta/content-language/


[1] 
http://dev.w3.org/html5/spec/semantics#attr-meta-http-equiv-content-language

[2] http://dev.w3.org/html5/spec/dom#the-lang-and-xml:lang-attributes

[3] 
http://www.malform.no/testing/html5/content-language-empty-string/#n2

[4] http://lists.w3.org/Archives/Public/public-html/2010Apr/0131

-- 
leif halvard silli
Received on Wednesday, 7 April 2010 03:18:21 UTC