W3C home > Mailing lists > Public > public-html@w3.org > February 2010

RE: ISSUE-88 / Re: what's the language of a document ?

From: Richard Ishida <ishida@w3.org>
Date: Wed, 24 Feb 2010 19:25:07 -0000
To: "'Ian Hickson'" <ian@hixie.ch>
Cc: <www-international@w3.org>, <public-html@w3.org>, "'Maciej Stachowiak'" <mjs@apple.com>
Message-ID: <013801cab587$13055750$391005f0$@org>
Hi Ian,

We just discussed this on the i18n telecon and they asked me to send these
comments on behalf of the group.

> -----Original Message-----
> From: public-html-request@w3.org [mailto:public-html-request@w3.org] On
> Behalf Of Ian Hickson
> Sent: 22 February 2010 02:12
> To: Richard Ishida
> Cc: www-international@w3.org; public-html@w3.org; 'Maciej Stachowiak'
> Subject: RE: ISSUE-88 / Re: what's the language of a document ?
> On Tue, 9 Feb 2010, Richard Ishida wrote:
> >
> > Are you ok to apply the points in
> > http://www.w3.org/International/wiki/Htmlissue88 to the spec?
> >From that document:
> | [1] Replace the term 'document-wide default language' with the term
> | 'Content-Language pragma language'.
> The spec currently uses the term "pragma-set default language".
> | [2] [...] clarify why the HTTP and pragma declarations are different
> | when it comes to values, and how they should be used
> The confusion is intended to be clarified by simply discouraging authors
> from using the pragma at all.
> The proposed text:
> | Note: Declarations in the HTTP header and the Content Language pragma
> | are metadata, referring to the document as a whole and expressing the
> | expected language or languages of the audience of the document.
> | A language attribute on an element describes the actual language used in
> | the range of content bounded by that element (and so values are limited
> | to a single language at a time).
> ...seems to just muddy the waters further. Per HTTP, the Content-Langauge
> HTTP header is supposed to say what languages the document is intended
> for, and doesn't say anything about the contents of the document. 
> The
> pragma, on the other hand, just sets the default language of the page. The
> pragra really has more in common with the attribute than the header, in
> terms of actual practical effect.

I believe we are agreeing that the HTTP header contains metadata about the
document as a whole and it's best to use the language attribute to specify
the language of content. That's what I wrote in the proposed note above.  

It's significant that the thing we're calling the pragma is a use of a
<meta> element.  It's metadata, and the view of the i18n WG is that it
should be available for use to specify metadata if you need to do so *in the
document*.  We have also conceded that, if there is no lang attribute for
some content, then it would be reasonable to infer the language of that
content from the pragma, if it contains a single language value (because
otherwise you can't be sure).

It's true that a lot of people misunderstood the use of this pragma in the
past, but that's what we're trying to clarify here (and btw I've seen
evidence that that is changing).
The i18n WG agrees that authors should be discouraged from using the pragma
for the purposes that the lang attribute should be used, but we are also
saying that, its use should be *encouraged* for cases where you want to
specify metadata inside the document.  I was at a large meeting of language
professionals organized by the EC a week ago, and there is a huge interest
in improving the use of metadata on the Web. This is where the pragma can be
of use, when used correctly.

And if you are using this to specify metadata, you must allow for multiple
values.  What's more, changing the syntax of the pragma to accept only one
language is likely to only further confuse people, in the opinion of the
i18n WG, since it now appears to be more like the lang attribute, and in
addition, the behaviour is different to previous versions of HTML, which
further complicates explanations about how to handle language in HTML.

In addition, we are worried about the effect on legacy data of changing the
number of allowed language values for this meta element.  There may not be
much out there, but there may also be some, and we felt that this is
inconsistent with the efforts of the html folks to maintain backwards
compatibility in other areas.

> I'm certainly open to adding more disambiguating text, but I think it
> would be helpful to have some pointers to e-mails showing the confusion so
> that a more directed disambiguation could be crafted.
> | [3] [allow the pragma to have more than one value, because] There is
> | consensus that the current syntax should not be changed, and that it
> | should be possible to continue to specify multiple languages in the
> | pragma.
> I disagree that there's consensus here. I don't understand the value of
> allowing authors to specify values that are going to be ignored by
> processors.

There were discussions on the html list in which nearly everyone who
expressed an opinion was against the redefinition of the pragma to accept
only one value - you were the person arguing for the change.  This was why
we wanted to talk with you at TPAC and go through the proposals in
http://lists.w3.org/Archives/Public/public-html/2009Oct/1086.html (on which
the Change Proposal is based), and we left that meeting understanding that
you had agreed to the proposals. (In fact we were quite surprised about your
comments here.) Furthermore, that proposal has been on the list since
October, and the Change Proposal has been around since the beginning of
January, but we are unaware of anyone who has objected to that aspect of the
proposals other than you.

> | [4] Remove 'primary' from:
> |
> | "The lang attribute (in no namespace) specifies the primary language for
> | the element's contents and for any of the element's attributes that
> | contain text. Its value must be a valid BCP 47 language code, or the
> | empty string. [BCP47]"
> |
> | Rationale:
> |
> | Only one language can be declared at a time.
> Only one language can be _declared_ at a time, but that doesn't mean only
> one language is actually contained in the element.
> | [5] [...] If the pragma attribute contains a comma-separated list of
> | languages, it cannot be determined with any degree of certainty which of
> | the languages matches the content of the text.
> This was handled by changing the UA requirements of the pragma.
> I recommend going through the normal process for these, by the way (using
> bugs and so forth) rather than jumping straight to the Change Proposal
> stage. It will help ensure that we keep issues focused.

Actually we have been following the process.  Here is the original bug
report http://www.w3.org/Bugs/Public/show_bug.cgi?id=8088 which you
rejected.  Mike Smith raised it in Tracker as Issue-88
http://www.w3.org/html/wg/tracker/issues/88.  The i18n WG took over the
action from Leif to generate the Change Proposal as per the process, and the
CP has been tracked through several HTML WG teleconferences since.


Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

Received on Wednesday, 24 February 2010 19:25:41 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:58 UTC