W3C home > Mailing lists > Public > www-international@w3.org > October to December 2009

RE: what's the language of a document ?

From: Tex Texin <textexin@xencraft.com>
Date: Mon, 26 Oct 2009 21:45:49 -0700
To: "'Ian Hickson'" <ian@hixie.ch>, "'Divya Manian'" <divya.manian@gmail.com>, "'Martin Kliehm'" <martin.kliehm@namics.com>, "'John Cowan'" <cowan@ccil.org>
Cc: <public-html@w3.org>, <www-international@w3.org>
Message-ID: <002801ca56c0$67e37570$37aa6050$@com>
Ian,

So if someone attempts to be specific and declares content-language to be "es-mx,es-ar" for mexico and argentina,
or perhaps declares "en, en-us" then that information is thrown away in favor of unknown?

Also, does this change to the document default language impact just html behavior, or embedded scripting languages as well?

If there were code that checks for language and performs different actions based on languages in the document, that is affected as well?

I assume so.

Why does the default need to be monolingual?
tex

-----Original Message-----
From: www-international-request@w3.org [mailto:www-international-request@w3.org] On Behalf Of Ian Hickson
Sent: Monday, October 26, 2009 6:31 PM
To: Divya Manian; Martin Kliehm; John Cowan
Cc: <public-html@w3.org>; www-international@w3.org
Subject: Re: what's the language of a document ?

On Sun, 25 Oct 2009, Divya Manian wrote:
>
> Internationalization best practices [1] states:
> 
> Where a document contains content aimed at speakers of more than one 
> language, use Content-Language with a comma-separated list of language 
> tags.
> 
> The HTML 5 specs [2] state:
> 
> there is a document-wide default language set, then that is the 
> language of the node.
> 
> If there is no document-wide default language, then language information 
> from a higher-level protocol (such as HTTP), if any, must be used as the 
> final fallback language. In the absence of any language information, the 
> default value is unknown (the empty string).
> 
> What is not clear is, what happens if a HTML document has a HTTP header
> Content-Language has a comma-separated list of language tags and no other
> language declarations? I found on a thread [3] that states such a document
> will be declared to use "unknown" language in this case. It would be good to
> have this case explicitly stated.

I've updated the spec to say that when the higher-level protocol reports 
multiple languages, they are all ignored in favour of the default 
(unknown).


On Sun, 25 Oct 2009, Martin Kliehm wrote:
> 
> Also in XHTML notation empty strings are disallowed, so the default 
> value for "unknown" would be in that case "und". [4]

On Sun, 25 Oct 2009, John Cowan wrote:
> 
> Why would empty strings be disallowed in xml:lang attributes?  I can 
> find no indication of that in XHTML 1.0.

In HTML5, the "unknown" value is the empty string (for "lang"). The 
xml:lang attribute is defined by the XML spec.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 27 October 2009 09:43:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 October 2009 09:43:35 GMT