Null change proposal for ISSUE-88 (mark IV)

ISSUE-88
========

SUMMARY
There is no problem and the proposed remedy is to change nothing.

RATIONALE
There is no problem.

Another change proposal suggests adding a note on the basis that we should 
clarify why the HTTP and pragma declarations are different to the lang="" 
attribute when it comes to values, and how they should be used, suggesting 
that this is a constant source of confusion.

However, the HTML5 specification already goes to some lengths to alleviate 
this confusion, for example by strongly discouraging the use of the 
pragma, encouraging lang="" use instead, and explicitly requiring that 
conformance checkers warn of this issue where relevant.

It isn't clear that the suggested note would actually do anything further 
to reduce the confusion. If anything, it might make matters worse, by 
offering an (incorrect) rationale for using the pragma. The pragma doesn't 
give metadata about the document. The original intent of the <meta 
http-equiv> feature was to provide a way for _servers_ to include data in 
their HTTP headers on a per-file basis; this isn't document-wide metadata 
for user agents, it's for servers. This original intent also doesn't match 
reality; reality is that this pragma sets the default language for 
lang="", which also isn't document-wide metadata for user agents.

The same change proposal also suggests a second change, namely to change 
the syntax to allow multiple comma-separated language codes, even though 
providing multiple language codes like this would cause the entire pragma 
to be ignored.

User agents vary in their handling of the Content-Language pragma. Some 
user agents support a comma-separated list as meaning (contrary to the 
intent of the Content-Language HTTP header) that the root element and its 
descendants, in the absence of any lang="" attribute, are in multiple 
languages. This seems to contradict the model expected by the :lang 
selector and by the lang="" attribute, which assume that each element has 
a single language.

Other user agents treat the comma as part of the language tag, for example 
treating <meta http-equiv="Content-Language" content="en,fr"> as setting a 
pragma-set default language of "en,fr", which can be matched by a selector 
such as ":lang(en\,fr)", and specifically _not_ by ":lang(en)".

(The specification's UA conformance criteria propose a compromise model 
wherein user agents ignore pragmas that specify multiple languages, 
acknowledging that they are multiple languages, but not making any one 
language have a higher priority than the others and not requiring that the 
user agents support the multi-language model, which would require 
significant effort for what is just a legacy feature at this point.)

Because of the way some legacy UAs handle this pragma, and because the 
behaviour of conforming UAs drops pragmas with multiple languages, it 
would be ill advised for us to make multiple values conforming. The way to 
mark that a document _uses_ multiple languages in such a way that user 
agents can actually parse and find this information is to use the lang="" 
attribute in the document. Putting multiple values in the pragma would 
fail to handle this according to the proposal.

Another possible use case would be to to have a standard way to say who 
the target audience of the document is, but in practice few people use 
that information on the Web, so it doesn't seem like having a pragma that 
exposes this information would be useful, even if we ignore that the user 
agents are currently required to ignore that information.

Even if there was such a need, this feature would be a bad way to provide 
that information, since it is used in an incompatible way by user agents 
(they use this information to determine processing behaviour -- none of 
the languages are treated as a target audience language hint).

For controlled environments, there are a multitude of options available to 
authors, such as the HTTP header of the same name, <meta name> with custom 
names, microdata, RDFa, out-of-band data, <script> blocks, etc. We don't 
need to use this mechanism for that purpose. Doing so would just confuse 
authors further.

No rationale is given for this second change, so it is hard to evaluate 
what the benefit of making this change would be.


DETAILS
Change nothing.

IMPACT

POSITIVE EFFECTS
* Encourages authoring behaviour compatible with both legacy user agents 
and with conforming user agents.
* Flags uses of the pragma in existing documents that are not being 
reliably processed in existing UAs.

NEGATIVE EFFECTS
* Flags uses of the pragma in existing documents that are harmless, such 
as "en,en-US". However, evidence suggests that use of the comma is pretty 
rare anyway:
   http://lists.w3.org/Archives/Public/public-html/2010Apr/0088.html

CONFORMANCE CLASS CHANGES
None.

RISKS
Maybe allowing the pragma at all is not going far enough.


REFERENCES
Tests: http://www.hixie.ch/tests/adhoc/html/meta/content-language/


-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 9 April 2010 00:01:14 UTC