W3C home > Mailing lists > Public > www-international@w3.org > October to December 2009

RE: what's the language of a document ?

From: Tex Texin <textexin@xencraft.com>
Date: Sat, 31 Oct 2009 18:05:54 -0700
To: "'Richard Ishida'" <ishida@w3.org>, "'Ian Hickson'" <ian@hixie.ch>
Cc: "'Simon Pieters'" <simonp@opera.com>, "'Divya Manian'" <divya.manian@gmail.com>, "'Martin Kliehm'" <martin.kliehm@namics.com>, "'John Cowan'" <cowan@ccil.org>, <public-html@w3.org>, <www-international@w3.org>, '"Martin J. Dürst"' <duerst@it.aoyama.ac.jp>
Message-ID: <002e01ca5a8f$777acc40$667064c0$@com>
Re: [3] Establish the precedence between http vs meta.  
 
I wish we could eliminate this nonsense altogether.
The description of the content of a document should be self-contained within
the document and not in the protocol.
The protocol should only ever reflect what is in the document to enable
routing and filters etc.
But documents should be self-declared.
 
 
[1] Explain clearly that declarations in the http header and the meta
element refer to the document as an object, rather than the text in a
specific element (this is what makes the distinction between single and
multiple values sensible).
 
This is contrived.
There is no reason an element cannot contain sub-elements that are in
different languages, so why force a single language description.
 
There is value to letting a processing agent know which languages are
included so it can use appropriate rendering rules and have the right
resources loaded (eg fonts) as opposed to having it run into a new language
and react dynamically.
 
It is fine to declare that one language is a primary to establish the
overall treatment of the text, or to be the default for text without a
language declaration, but there is no reason to pretend elements are
monolingual when they are not.
 
[4] Establish the rule that multiple values in the place that has precedence
equates to lang="".
 
Why would you remove information that has been provided?
 
 
-----Original Message-----
From: www-international-request@w3.org
[mailto:www-international-request@w3.org] On Behalf Of Richard Ishida
Sent: Thursday, October 29, 2009 11:11 AM
To: 'Ian Hickson'
Cc: 'Simon Pieters'; 'Divya Manian'; 'Martin Kliehm'; 'John Cowan';
public-html@w3.org; www-international@w3.org; '"Martin J. Dürst"'
Subject: RE: what's the language of a document ?
 
Personally, I agree with Martin here.  I have spent a long time trying
simplify explanations so that people can understand how to manage the
various different ways of declaring language in HTML (http vs meta vs lang;
html vs xhtml vs xml), and it really concerns me that I will now have to say
"But in html5 things are slightly different again".    It's already hard
enough to get people to declare language, and I think that the changes that
come with the current text in html5 will only make things worse by causing
further confusion. On the other hand, I think there may be a way to satisfy
everyone.
 
We discussed this during the Internationalization WG telecon last night, and
I was actioned to put the following to you and the HTML group on behalf of
the i18n WG.
 
 
Our proposal is as follows and is based on the text of the following
sections:
http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#d
ocument-wide-default-language
http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#th
e-lang-and-xml:lang-attributes
 
 
[1] Explain clearly that declarations in the http header and the meta
element refer to the document as an object, rather than the text in a
specific element (this is what makes the distinction between single and
multiple values sensible). 
 
[2] Continue to recommend that the document-wide default language be defined
by a lang attribute on the html tag, but say that if the lang attribute is
missing and there is a language defined in the http or meta, then those
language declarations can be used to guess the language of the text, if they
contain a single value.
 
[3] Establish the precedence between http vs meta.  
 
[4] Establish the rule that multiple values in the place that has precedence
equates to lang="".
 
This is very close to what we already have, but doesn't try to make the meta
declaration a different thing than the http declaration, or change it so
that multiple values are no longer valid.  At the same time, it allows
either the http or the meta to provide language information for
text-processing, if the declaration is useable.
 
We also feel that the spec seems to restrict the use of the term
'document-wide default language' to refer only to a language declared using
the meta, and this is rather odd.  We feel that in fact the lang attribute
on the html element also establishes a document-wide default language. (See
the text: "Until the pragma is successfully processed, there is no
document-wide default language.")
 
RI
 
PS: I could suggest some changes to the wording, if that helps.
 
 
============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)
 
http://www.w3.org/International/
http://rishida.net/
 
 
 
 
> -----Original Message-----
> From: www-international-request@w3.org [mailto:www-international-
> request@w3.org] On Behalf Of "Martin J. Dürst"
> Sent: 27 October 2009 11:09
> To: Ian Hickson
> Cc: Simon Pieters; Divya Manian; Martin Kliehm; John Cowan; <public-
> html@w3.org>; www-international@w3.org
> Subject: Re: what's the language of a document ?
> 
> On 2009/10/27 19:37, Ian Hickson wrote:
> > On Tue, 27 Oct 2009, Simon Pieters wrote:
> >> This doesn't match what's specced for<meta http-equiv=content-
> language
> >> content=foo,bar>.
> >
> > That's intentional, and is based on data about how people actually use
> > that pragma.
> 
> There's always a way to justify inconsistent choices (be it browser
> implementations, 'data' about how people (who?) use some feature (at
> what point in time?),...). But it would be way better to be consistent.
> 
> And there is always a way to justify making choices that everybody
> except those knowing all the details of the spec don't understand. But
> it would be way better to make choices that are easy to understand (e.g.
> http-equiv actually meaning what it says, namely "equivalent to the
> corresponding HTTP header").
> 
> There are lots of cases where over time, people have come to a better
> understanding of how things work. For stuff that authors/producers
> aren't supposed to produce, I don't mind too much that HTML5 is
> hopelessly complex and inconsistent. I can live without remembering it
> all, and can tell others to avoid it. However, for stuff like the above,
> which may be used even by very consciously clean developers, creating
> inconsistencies such the above is a heavy negative legacy.
> 
> Regards,   Martin.
> 
> --
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
 
 
 
Received on Sunday, 1 November 2009 01:06:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 1 November 2009 01:06:41 GMT