- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 7 Apr 2010 05:17:45 +0200
- To: Ian Hickson <ian@hixie.ch>
- Cc: public-html@w3.org
Ian Hickson, Sun, 4 Apr 2010 01:01:53 +0000 (UTC): (I'm sorry that I have to repeat myself to comment this proposal ...) .... > As far as I am aware, no bug pointing to confusion on this subject and > asking for clarification has been rejected, which makes using the change > proposal process inappropriate. Didn't get the rationale. Agree with the conclusion: one could delay the change prop process. There are bugs to solve. > The same change proposal also suggests a second change, namely to change > the syntax to allow multiple comma-separated language codes, even though > all but the first would be ignored. Actually, this is not what the spec says, is it? The spec says that every time the content attribute contains a comma separated list of language tags, then the content attribute contains a comma character, and would be ignored? Quoting the third step of the algorithm: [1] ]] 3. If the element's content attribute contains a U+002C COMMA character (,) then abort these steps. [[ So what do you mean by saying that "all but the first would be ignored"? > User agents vary in their handling of the Content-Language pragma. Some > user agents support a comma-separated list as meaning (contrary to the > intent of the Content-Language HTTP header) that the root element and its > descendants, in the absence of any lang="" attribute, are in multiple > languages. This seems to contradict the model expected by the :lang > selector and by the lang="" attribute, which assume that each element has > a single language. Yes, Mozilla UAs behave like this. > Other user agents treat the comma as part of the language tag, for example > treating <meta http-equiv="Content-Language" content="en,fr"> as setting a > pragma-set default language of "en,fr", which can be matched by a selector > such as ":lang(en\,fr)", and specifically _not_ by ":lang(en)". Yes, all non-Mozillas UAs behave more or less like this. > (The specification's UA conformance criteria propose a compromise model > wherein the user agents aren't required to support multiple languages per > element, but still interpret the comma correctly, rather than treating it > as part of the language code.) I don't get this to match with "abort these steps" - see my spec quote above. But anyway, I have filed some bugs, whose message is a different compromise model: 1) Comma separated list is allowed: content="en,fr" 2) But UAs handle it like *:lang(en\,fr) This is close to 100% like all user agents, except Mozilla, do it today. (The issues that Opera has with commas is not worth counting, because authors usually do no include illegal language tags in their CSS selectors. The other Opera issues has *could* be worth counting - see below.) Thus there should be very low risk in speccing it. And simple for vendors to implement. > Because of the way some legacy UAs handle this pragma, and because the > behaviour of conforming UAs drops all but the first language, To use HTML5's definition of conforming, doesn't seem meaningful, since we can change it. No UAs behave in the "conforming" way, today. > it would be > ill advised for us to make multiple values conforming. The way to mark > that a document _uses_ multiple languages in such a way that user agents > can actually parse and find this information is to use the lang="" > attribute in the document. Putting multiple values in the pragma would > fail to handle this according to the proposal. I would turn this around: By allowing a comma separated list, but still demand that user agents treat it like they treat an illegal lang="en,fr" - namely as *:lang(en\,fr), we make sure that it doesn't work as a (meaningful) language selector, thereby triggering authors to use lang="*" instead. ... > IMPACT > > POSITIVE EFFECTS > * Encourages authoring behaviour compatible with both legacy user agents > and with conforming user agents. To parse <META content-language content="en,fr"> the same way as <html lang="en,fr"> seems to me like a better way to encourage correct authoring behaviour. It must also be confusing to authors when they discover (if they understand it at all ...) that <meta http-equiv="content-language" content="en,fr"> as well as <meta http-equiv="content-language" content="<emptystring>" will cause user agents to go looking for what the server has to say (instead of having the same meaning as <html lang="en,fr"> and <html lang="<emptystring>"> ). See below for more explanation. > * Flags uses of the pragma in existing documents that are not being > reliably processed in existing UAs. More specifically: It flags a use of the pragma were *Mozilla* web browsers behave different from the rest. But there are important things which the current spec text doesn't make flagged: (1) Your reference section in the bottom of your e-mail points to a test which reveals that Opera can't even handle correctly a content-language declaration which only contains a single language tag! (It then treats <p> elements in one way, and <div> in another.) This is not flagged ... (2) If the value of the content-language element is the empty string, then Mozilla looks for the preceding META element, or visit the HTTP header from the server. But still, the empty string is not flagged. Instead, the specced text *encourages* a behaviour which is qutie similar to the Mozilla behaviour: [1] ]] Until the pragma is successfully processed, there is no pragma-set default language. [ ... snip ...] 2. If the meta element has no content attribute, or if that attribute's value is the empty string, then abort these steps. [[ Thus: empty string means that no pragma default language gets set. And then: [2] ]] If there is no pragma-set default language set, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language instead. [[ See my test page for the empty string issue. [3] Getting user agents to look in the same spots for fallback info seems like a very useful step towards making them behave the same way. The simples way to do this is to make user agents treat content="*" like they treat lang="*". This should also be simple to agree about, as long as Mozilla is willing to change their behaviour, since the lang="*" behaviour is not contested. > NEGATIVE EFFECTS > * Flags uses of the pragma in existing documents that are harmless, such > as "en,en-US". However, evidence suggests that use of the comma is pretty > rare anyway: > http://lists.w3.org/Archives/Public/public-html/2010Apr/0088.html If they are "pretty rare", then I think we should insist on treating content="en,fr" the same way as lang="en,fr" is treated. This will also ensure that UAs look no further than to the latest META content-language declaration, instead of causing them to - like Mozilla - go looking in the at the server. > NEGATIVE EFFECTS > * Flags uses of the pragma in existing documents that are harmless, such > as "en,en-US". However, evidence suggests that use of the comma is pretty > rare anyway: > http://lists.w3.org/Archives/Public/public-html/2010Apr/0088.html There is a gap in that evidence: He did not count what the server says. E.g. if x-ua-compatible remains forbidden in HTML5, we may not see it often in pages (we must at least hope so). However, authors could still be configuring it on their web server. Likewise, just because content-language with multiple language tags aren't found in pages, doesn't mean that this isn't used inside servers. I say so because Mozilla both looks at what the server says and also supports multiple values *and* are afraid of breaking the Web. [4] I could imagine that servers do use multiple languages, since it makes more sense on the server side to do this than it makes to do it in a document. .... > REFERENCES > Tests: http://www.hixie.ch/tests/adhoc/html/meta/content-language/ [1] http://dev.w3.org/html5/spec/semantics#attr-meta-http-equiv-content-language [2] http://dev.w3.org/html5/spec/dom#the-lang-and-xml:lang-attributes [3] http://www.malform.no/testing/html5/content-language-empty-string/#n2 [4] http://lists.w3.org/Archives/Public/public-html/2010Apr/0131 -- leif halvard silli
Received on Wednesday, 7 April 2010 03:18:21 UTC