Re: ISSUE-88 / Re: what's the language of a document ? from Roy T. Fielding on 2010-02-23 (www-international@w3.org from January to March 2010)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Mon, 22 Feb 2010 22:35:29 -0800
To: Ian Hickson <ian@hixie.ch>
Cc: "Phillips, Addison" <addison@amazon.com>, Mark Davis ☕ <mark@macchiato.com>, "www-international@w3.org" <www-international@w3.org>, HTMLwg WG <public-html@w3.org>
Message-Id: <A98ED745-FD63-4C3C-AF41-507B05E2C915@gbiv.com>

On Feb 21, 2010, at 10:08 PM, Ian Hickson wrote:

> On Mon, 22 Feb 2010, Phillips, Addison wrote:
>> 
>> The problem that Mark (and Richard) are referring to (I think) is the 
>> <meta> pragma, which is not currently and should not be changed to be, 
>> IMHO, considered the "primary" language of the document. This pragma can 
>> contain a list of languages. One of these might be inferred to be the 
>> primary (outer) document processing language if the 'lang' attribute is 
>> missing. And that, in a nutshell, is what I think we're wrestling with 
>> here: whether the pragma should be wired up to 'lang' in that case, and, 
>> if it has more than one language, which language should be applied.
> 
> The spec's definition of the Content-Language pragma is specified as it is 
> because that's what user agents do with that pragma. Making it do 
> something else would require changing user agent implementations.

Sorry, that simply isn't true.  Most of what is written in the section
on "Pragma directives", aside from the behavioral algorithms that only
apply during browser rendering, is just made up constraints that don't
actually exist in practice and don't make any sense regardless.  The
Content-Language value, for example, has only recently been used as a
default for primary language by a few user agents; the fact that a
default only makes sense when one language is given does not in any way
change the definition or purpose of Content-Language.  It should only
affect the language choice algorithm, which doesn't even belong in
that section.

The sensible editorial decision would be to describe the effect
of content-language metadata within a section on "figuring out the
primary language" rather than assume this browser-specific
behavior is somehow definitive on the meaning of Content-Language
as metadata.  Doing so would not require changing user agent
implementations, the algorithm would remain exactly as you want
it to be described, and the result would be free of the definition
bugs that you introduced by redefining common document metadata
as some sort of browser instruction.

The existing description is completely wrong for content within a
content management system, for example.  The content-language
(audience) for a page might often differ, intentionally, from the
lang attribute that is used to describe the language of a given
block of text -- the most common examples of that are found in
language-learning exercises and poetry/lyric translations.
Likewise, content-language metadata in HTML is often used to
populate content negotiation data on the server, and to influence
workflow decisions for multilingual websites when an author updates
the content on one page (e.g., triggering language-specific alerts to
the people responsible for translating the page to other languages).

In any case, the http-equiv attribute exists for one and only one
purpose: to associate the metadata name with the HTTP header field
registry, as opposed to the unbounded name attribute.  It was the
first incarnation of a profile indicator.  Its named values are defined
elsewhere, by definition, and thus cannot be redefined by HTML5.
They are not defined by a WHATWG wiki page.

BTW, why are you using the term "pragma" for this metadata?
Some of them might impact browser processing, but they certainly
aren't limited to processing flags and do not in any way resemble
the pragmas found in compiler design.

....Roy

Received on Tuesday, 23 February 2010 06:36:07 UTC