W3C home > Mailing lists > Public > public-html@w3.org > February 2010

RE: ISSUE-88 / Re: what's the language of a document ?

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 22 Feb 2010 02:11:58 +0000 (UTC)
To: Richard Ishida <ishida@w3.org>
Cc: www-international@w3.org, public-html@w3.org, 'Maciej Stachowiak' <mjs@apple.com>
Message-ID: <Pine.LNX.4.64.1002220140160.1710@ps20323.dreamhostps.com>
On Tue, 9 Feb 2010, Richard Ishida wrote:
> 
> Are you ok to apply the points in 
> http://www.w3.org/International/wiki/Htmlissue88 to the spec?

>From that document:

| [1] Replace the term 'document-wide default language' with the term 
| 'Content-Language pragma language'.

The spec currently uses the term "pragma-set default language".


| [2] [...] clarify why the HTTP and pragma declarations are different 
| when it comes to values, and how they should be used

The confusion is intended to be clarified by simply discouraging authors 
from using the pragma at all.

The proposed text:

| Note: Declarations in the HTTP header and the Content Language pragma 
| are metadata, referring to the document as a whole and expressing the 
| expected language or languages of the audience of the document.
| A language attribute on an element describes the actual language used in 
| the range of content bounded by that element (and so values are limited 
| to a single language at a time).

...seems to just muddy the waters further. Per HTTP, the Content-Langauge 
HTTP header is supposed to say what languages the document is intended 
for, and doesn't say anything about the contents of the document. The 
pragma, on the other hand, just sets the default language of the page. The 
pragra really has more in common with the attribute than the header, in 
terms of actual practical effect.

I'm certainly open to adding more disambiguating text, but I think it 
would be helpful to have some pointers to e-mails showing the confusion so 
that a more directed disambiguation could be crafted.


| [3] [allow the pragma to have more than one value, because] There is 
| consensus that the current syntax should not be changed, and that it 
| should be possible to continue to specify multiple languages in the 
| pragma.

I disagree that there's consensus here. I don't understand the value of 
allowing authors to specify values that are going to be ignored by 
processors.


| [4] Remove 'primary' from:
|
| "The lang attribute (in no namespace) specifies the primary language for 
| the element's contents and for any of the element's attributes that 
| contain text. Its value must be a valid BCP 47 language code, or the 
| empty string. [BCP47]"
|
| Rationale:
|
| Only one language can be declared at a time.

Only one language can be _declared_ at a time, but that doesn't mean only 
one language is actually contained in the element.


| [5] [...] If the pragma attribute contains a comma-separated list of 
| languages, it cannot be determined with any degree of certainty which of 
| the languages matches the content of the text.

This was handled by changing the UA requirements of the pragma.


I recommend going through the normal process for these, by the way (using 
bugs and so forth) rather than jumping straight to the Change Proposal 
stage. It will help ensure that we keep issues focused.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 22 February 2010 02:12:29 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:14 UTC