Re: ISSUE-88 / Re: what's the language of a document ?

On Thu, 1 Apr 2010, Maciej Stachowiak wrote:
> Ian, comments on the two points below would be appreciated.

My position hasn't changed since this was last proposed:

> > [[
> > [3] Change:
> > "For meta elements with an http-equiv attribute in the Content Language
> > state, the content attribute must have a value consisting of a valid BCP 47
> > language code. [BCP47]"
> > to
> > "For meta elements with an http-equiv attribute in the Content Language
> > state, the content attribute must have a value consisting of one or more
> > valid BCP 47 language codes, separated by commas. [BCP47]"
> > ]]
> > 
> > Since the algorithm just above this text now allows for treatment of a
> > comma-separated list of values in determining the pragma-set default
> > language, we suspect that it might be an oversight that this text wasn't
> > changed.

It was not an oversight.

I do not think allowing multiple values is a good idea, because it doesn't 
match reality. User agents do not pay any attention to values after the 
first. The right way to mark that a document _uses_ multiple languages is 
to use the lang="" attribute in the document. There is no reason to have a 
standard way to say who the target audience of the document is, since in 
practice few people use that information on the Web. Even if there was 
such a need, this feature would be a bad way to provide that information, 
since it is used in an incompatible way by user agents (the first 
language, and only the first language, is used to determine processing 
behaviour). For controlled environments, there are a multitude of options 
available to authors, such as <meta name> with custom names, microdata, 
RDFa, out-of-band data, <script> blocks, etc. We don't need to use this 
mechanism for that purpose. Doing so would just confuse authors further.

> > [[
> > [2] Add an additional note just before the numbered list in the section
> > about Content language state, with the following text:
> > 
> > "Note: Declarations in the HTTP header and the Content Language pragma are
> > metadata, referring to the document as a whole and expressing the expected
> > language or languages of the audience of the document. On the other hand, a
> > language attribute on an element describes the actual language used in the
> > range of content bounded by that element (and so values are limited to a
> > single language at a time)."
> > 
> > Rationale: To clarify why the HTTP and pragma declarations are different
> > when it comes to values, and how they should be used. This is a constant
> > source of confusion.
> > ]]
> > 
> > On balance, we would still prefer to see a note of this kind in the spec, if
> > the editor agrees.

The above note is wrong in practice. The pragma doesn't give metadata 
abotu the document. The original intent of the <meta http-equiv> feature 
was to provide a way for _servers_ to include data in their HTTP headers 
on a per-file basis. This isn't document-wide metadata for user agents, 
it's for servers. This original intent doesn't match reality; reality is 
that this pragma sets the default language for lang="". That also isn't 
document-wide metadata for user agents.

If there is a "constant source of confusion", then what we need is 
pointers to this confusion, so that text intended specifically to address 
that confusion is included in the spec. I do not believe the text above 
would reduce confusion; I believe it would cause it.

(Note that the proposed note above doesn't actually even match the stated 
rationale, as far as I can tell.)

Ian Hickson               U+1047E                )\._.,--....,'``.    fL       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 1 April 2010 21:20:28 UTC