RE: what's the language of a document ?

I personally tend to agree with Roy Fielding, John Cowan, and Tex Texin actually, and not with Martin and Richard Ishida because I regulary create documents in two languages (French-English; French-Old French); following Richard Ishida's recommendations in "Specifying Languages in XHTML and HTML Content," I list all the languages in the meta content tag (when I have access to it; because my documents are generally served from a locale I don't control, I don't have access to the http headers).  I still set the html language to one or the other when possible and then if I get time specify additional information in relevant elements).


I think there will always be cases where people will not tag a document correctly; if a tag is needed it makes no sense to eliminate it because someone cannot yet use it properly.  And I think that Tex makes a point too--someone might specify a document language as fr-FR and fr-LU but not fr-CA and it makes no sense to default to unknown.


However I'll look at the proposal.




C. E. Whitehead 
> From:
> To:
> CC:;;;;;;
> Date: Thu, 29 Oct 2009 18:11:27 +0000
> Subject: RE: what's the language of a document ?
> Personally, I agree with Martin here. I have spent a long time trying
> simplify explanations so that people can understand how to manage the
> various different ways of declaring language in HTML (http vs meta vs lang;
> html vs xhtml vs xml), and it really concerns me that I will now have to say
> "But in html5 things are slightly different again". It's already hard
> enough to get people to declare language, and I think that the changes that
> come with the current text in html5 will only make things worse by causing
> further confusion. On the other hand, I think there may be a way to satisfy
> everyone.
> We discussed this during the Internationalization WG telecon last night, and
> I was actioned to put the following to you and the HTML group on behalf of
> the i18n WG.
> Our proposal is as follows and is based on the text of the following
> sections:
> ocument-wide-default-language
> e-lang-and-xml:lang-attributes
> [1] Explain clearly that declarations in the http header and the meta
> element refer to the document as an object, rather than the text in a
> specific element (this is what makes the distinction between single and
> multiple values sensible). 
> [2] Continue to recommend that the document-wide default language be defined
> by a lang attribute on the html tag, but say that if the lang attribute is
> missing and there is a language defined in the http or meta, then those
> language declarations can be used to guess the language of the text, if they
> contain a single value.
> [3] Establish the precedence between http vs meta. 
> [4] Establish the rule that multiple values in the place that has precedence
> equates to lang="".
> This is very close to what we already have, but doesn't try to make the meta
> declaration a different thing than the http declaration, or change it so
> that multiple values are no longer valid. At the same time, it allows
> either the http or the meta to provide language information for
> text-processing, if the declaration is useable.
> We also feel that the spec seems to restrict the use of the term
> 'document-wide default language' to refer only to a language declared using
> the meta, and this is rather odd. We feel that in fact the lang attribute
> on the html element also establishes a document-wide default language. (See
> the text: "Until the pragma is successfully processed, there is no
> document-wide default language.")
> RI
> PS: I could suggest some changes to the wording, if that helps.
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
> > -----Original Message-----
> > From: [mailto:www-international-
> >] On Behalf Of "Martin J. Dürst"
> > Sent: 27 October 2009 11:09
> > To: Ian Hickson
> > Cc: Simon Pieters; Divya Manian; Martin Kliehm; John Cowan; <public-
> >>;
> > Subject: Re: what's the language of a document ?
> > 
> > On 2009/10/27 19:37, Ian Hickson wrote:
> > > On Tue, 27 Oct 2009, Simon Pieters wrote:
> > >> This doesn't match what's specced for<meta http-equiv=content-
> > language
> > >> content=foo,bar>.
> > >
> > > That's intentional, and is based on data about how people actually use
> > > that pragma.
> > 
> > There's always a way to justify inconsistent choices (be it browser
> > implementations, 'data' about how people (who?) use some feature (at
> > what point in time?),...). But it would be way better to be consistent.
> > 
> > And there is always a way to justify making choices that everybody
> > except those knowing all the details of the spec don't understand. But
> > it would be way better to make choices that are easy to understand (e.g.
> > http-equiv actually meaning what it says, namely "equivalent to the
> > corresponding HTTP header").
> > 
> > There are lots of cases where over time, people have come to a better
> > understanding of how things work. For stuff that authors/producers
> > aren't supposed to produce, I don't mind too much that HTML5 is
> > hopelessly complex and inconsistent. I can live without remembering it
> > all, and can tell others to avoid it. However, for stuff like the above,
> > which may be used even by very consciously clean developers, creating
> > inconsistencies such the above is a heavy negative legacy.
> > 
> > Regards, Martin.
> > 
> > --
> > #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> > #-#

Received on Thursday, 29 October 2009 18:48:14 UTC