W3C home > Mailing lists > Public > www-international@w3.org > January to March 2010

RE: ISSUE-88 / Re: what's the language of a document ?

From: Richard Ishida <ishida@w3.org>
Date: Tue, 30 Mar 2010 17:21:23 +0100
To: "'Leif Halvard Silli'" <xn--mlform-iua@xn--mlform-iua.no>, "'CE Whitehead'" <cewcathar@hotmail.com>
Cc: <ian@hixie.ch>, <www-international@w3.org>, <public-html@w3.org>
Message-ID: <00c301cad025$0a936680$1fba3380$@org>
> From: Leif Halvard Silli [mailto:xn--mlform-iua@målform.no]
> Sent: 21 March 2010 16:28

> There are some XHTML document types which forbids the @lang attribute.
> When you serve these document types as 'text/html', then all language
> info is lost, as xml:lang="<whatever>" is not respected in 'text/html'.
> For such documents, using <meta> content-language enables you to at
> least define *one* language (for all elements) in a user agent
> compatible way.

Very soon this will no longer be the case.  Changes to the remaining XHTML
specs to allow them to be served as text/html are at an advanced stage, and
include the addition of the lang attribute - since that is necessary for
language information to be recognized in HTML.  So this case ought not to be
used for as a basis for proposed behaviour in HTML5.

RI

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/




> -----Original Message-----
> To: CE Whitehead
> Cc: ian@hixie.ch; www-international@w3.org; public-html@w3.org;
> ishida@w3.org
> Subject: RE: ISSUE-88 / Re: what's the language of a document ?
> 
> CE Whitehead, Sat, 20 Mar 2010 19:31:21 -0400:
> 
> [ Snip. Reply to remaining part of letter: ]
> 
> > RE: ISSUE-88 / Re: what's the language of a document ?
> > From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
> > Date: Sat, 20 Mar 2010 06:11:28 +0100
> >> However, if there is only one <meta> content-language element, then
> >> this element is both the first and the last, at once. ;-) Thus user
> >> agents will use it for setting the language. But web servers will also
> >> use the same element. If there are two, then web servers should use the
> >> first while user agents use the last.
> > But that's a "should."  And user agents should use the html lang= or
> > xml lang= too, right? And if they did we would not need two meta
> > content-language elements, right?
> 
> 
> But otherwise, the current text in HTML5 basically says basically the
> same thing as you: We don't need to use <meta> content-language - use
> lang="<*>" instead. This is true, in theory, but not in practise. Well,
> it is true in practise as well. Except for the particular use case when
> you explicitly want to set the language of an element to unknown (by
> providing an empty lang attribute. (This use case is a result of the
> fact that HTML5 aligns the meaning of an empty lang="" to the
> same as the meaning of an empty xml:lang="".) For this particular use
> case, it is necessary to make sure that the <meta> content-language
> element says, in a user agent compatible way, that the language is
> unknown.
> 
> [...]
> > Still I ask:  why not simply ask the browsers to respect the html
> > lang="" or xml lang="" declaration if they do not?
> 
> HTML5 does ask that they respect an empty lang="" in HTML (or an empty
> xml:lang in XHTML). Neither my change proposal nor the I18N WG's
> proposal conflict with this.
> 
> > Would the browsers be more inclined to process a second meta
> > content-lang element set to lang=""
> > than to respect the xml lang="" or html lang=""?
> > That is my real question for you.
> 
> (The attribute of the <meta> content-language element which contains
> the language tag(s) isn't called 'lang', it is called 'content'.)
> 
> Assuming that you meant 'empty lang=""' (and not 'any lang="", empty or
> not'), then the answer to your question is that the problematic user
> agents (Mozilla family + Konqueror/Webkit/Chrome family) respect an
> empty <meta> content-language. Whereas the same two browser families,
> without going into the (important!) details (again), don't (always)
> respect an empty lang="".
> 
> I don't have any particular to say about xml:lang="" - as I have only
> tested 'text/html' (where it has no effect).
> 
>   [...]
> > Yes, so you are saying that specifying multiple languages at this point
> > is equivalent to specifying lang=""
> 
> Richard has described the meaning of an empty xml:lang="" like this: [1]
> 
> 	]]XML also provides a means to prevent inheritance of language
> using
> the empty string, ie. xml:lang="". Essentially, this says: I do not
> want to associate any language with this information.[[
> 
> HTML5 changes, I believe, an empty lang="" to have the same meaning. I
> don't know if you, by the wording "specifying multiple languages",
> meant the same as Richard. From one angle it can certainly be correct
> to say that a truly multilingual document should not associate any
> particular language with itself.
> 
> [... some multiple language issues ...]
> > (I will send you a sample if you wish  -- in private email; I see no
> > reason to clutter up the list.)
> 
> Please do.
> 
> [1] http://www.w3.org/International/articles/language-tags/#overview
> --
> leif halvard silli
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.791 / Virus Database: 271.1.1/2760 - Release Date: 03/21/10
> 07:33:00
Received on Tuesday, 30 March 2010 16:21:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 30 March 2010 16:22:00 GMT