W3C home > Mailing lists > Public > public-html@w3.org > October 2009

RE: what's the language of a document ?

From: Richard Ishida <ishida@w3.org>
Date: Thu, 29 Oct 2009 18:11:27 -0000
To: "'Ian Hickson'" <ian@hixie.ch>
Cc: "'Simon Pieters'" <simonp@opera.com>, "'Divya Manian'" <divya.manian@gmail.com>, "'Martin Kliehm'" <martin.kliehm@namics.com>, "'John Cowan'" <cowan@ccil.org>, <public-html@w3.org>, <www-international@w3.org>, '"Martin J. Dürst"' <duerst@it.aoyama.ac.jp>
Message-ID: <004d01ca58c3$3c51c7e0$b4f557a0$@org>
Personally, I agree with Martin here.  I have spent a long time trying
simplify explanations so that people can understand how to manage the
various different ways of declaring language in HTML (http vs meta vs lang;
html vs xhtml vs xml), and it really concerns me that I will now have to say
"But in html5 things are slightly different again".    It's already hard
enough to get people to declare language, and I think that the changes that
come with the current text in html5 will only make things worse by causing
further confusion. On the other hand, I think there may be a way to satisfy

We discussed this during the Internationalization WG telecon last night, and
I was actioned to put the following to you and the HTML group on behalf of
the i18n WG.

Our proposal is as follows and is based on the text of the following

[1] Explain clearly that declarations in the http header and the meta
element refer to the document as an object, rather than the text in a
specific element (this is what makes the distinction between single and
multiple values sensible). 

[2] Continue to recommend that the document-wide default language be defined
by a lang attribute on the html tag, but say that if the lang attribute is
missing and there is a language defined in the http or meta, then those
language declarations can be used to guess the language of the text, if they
contain a single value.

[3] Establish the precedence between http vs meta.  

[4] Establish the rule that multiple values in the place that has precedence
equates to lang="".

This is very close to what we already have, but doesn't try to make the meta
declaration a different thing than the http declaration, or change it so
that multiple values are no longer valid.  At the same time, it allows
either the http or the meta to provide language information for
text-processing, if the declaration is useable.

We also feel that the spec seems to restrict the use of the term
'document-wide default language' to refer only to a language declared using
the meta, and this is rather odd.  We feel that in fact the lang attribute
on the html element also establishes a document-wide default language. (See
the text: "Until the pragma is successfully processed, there is no
document-wide default language.")


PS: I could suggest some changes to the wording, if that helps.

Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)


> -----Original Message-----
> From: www-international-request@w3.org [mailto:www-international-
> request@w3.org] On Behalf Of "Martin J. Dürst"
> Sent: 27 October 2009 11:09
> To: Ian Hickson
> Cc: Simon Pieters; Divya Manian; Martin Kliehm; John Cowan; <public-
> html@w3.org>; www-international@w3.org
> Subject: Re: what's the language of a document ?
> On 2009/10/27 19:37, Ian Hickson wrote:
> > On Tue, 27 Oct 2009, Simon Pieters wrote:
> >> This doesn't match what's specced for<meta http-equiv=content-
> language
> >> content=foo,bar>.
> >
> > That's intentional, and is based on data about how people actually use
> > that pragma.
> There's always a way to justify inconsistent choices (be it browser
> implementations, 'data' about how people (who?) use some feature (at
> what point in time?),...). But it would be way better to be consistent.
> And there is always a way to justify making choices that everybody
> except those knowing all the details of the spec don't understand. But
> it would be way better to make choices that are easy to understand (e.g.
> http-equiv actually meaning what it says, namely "equivalent to the
> corresponding HTTP header").
> There are lots of cases where over time, people have come to a better
> understanding of how things work. For stuff that authors/producers
> aren't supposed to produce, I don't mind too much that HTML5 is
> hopelessly complex and inconsistent. I can live without remembering it
> all, and can tell others to avoid it. However, for stuff like the above,
> which may be used even by very consciously clean developers, creating
> inconsistencies such the above is a heavy negative legacy.
> Regards,   Martin.
> --
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
Received on Thursday, 29 October 2009 18:12:25 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:45:01 UTC