RE: Meeting preread: Language declarations

Feedback is inline indicated by '>'. 
--------------------------------------------------

Question


Should I use HTTP Content-Language or a meta element to declare the language of my XHTML document?


Answer


[Read the background information]

A distinction needs to be drawn between declaring the language of content and identifying the language(s) of the intended audience. It is recommended that the lang and xml:lang attributes be used for the former, and the HTTP Content-Language setting or a meta element be used, if appropriate or needed, for the latter.

> In essence, it seems that you're recommending using both. I don't understand the distinction between 'language of content' and 'language(s) of the intended audience'.


Primary language


The discussion about pros and cons of attributes vs. meta elements or HTTP headers centers on information about the language of the document as a whole, ie. the primary language of the document. Note that a document with a primary language of, say, French may also contain fragments in other languages. To indicate the language of content for a fragment of a document there is no other choice but to use the lang and xml:lang attributes. 

In very rare circumstances, a document may have more than one primary language. Here we are talking about documents that repeat the same or parallel content in more than one language, and where both linguistic parts have the same weight. This is rare on the Web because it is usually much easier to link to separate pages for each localized version.

> I think the paragraph above makes a useful point, which you immediately consider when thinking about 'fragments' of different languages, ie what when that 'fragment' is of equivalent size to the rest.


Declaring the language of content


When declaring the language of a specific range of content, there can and must only be one language specified, and that language information will be inherited by all subelements, unless a lower level element applies a different value.

The standard way to declare the language of a document is to use the lang and xml:lang attributes in the html tag. Since all other elements in the document are a subset of the html tag, they naturally inherit this value. Existing user agents currently recognize language values declared in this way when they come to apply language-specific styling, default fonts for Chinese, Japanese and Korean, etc.


Use of the meta declaration


A key issue with the use of the meta tag is that it is not obvious how it's value is inherited by elements throughout the document. In current practise browsers do not recognize the value in the meta declaration when processing the document.

> Pedantic point: no apostrophe in 'its'.

Note also that the meta declaration allows you to declare more than one language. In such a case the meta element could not be declaring the language of content, since a range of content can only be in one language at a time. On the other hand, if you are dealing with a document that has multiple primary languages, one could concieve of this information being useful for classification of the document or enabling easier recognition of document appropriateness when searching. In this case, however, the meta statement is still really indicating the intended audience of the document, rather than declaring the language of a range of content. (Note that this cannot be done using the lang and xml:lang attributes, since they support only one language value at a time.) The author is not aware of any standard way in which such information is used at the moment.


Use of the HTTP Content-Language header


The HTTP Content-Language header could be used to declare the primary language of content, as long as only a single value was expressed. As soon as a list of languages is sent in the HTTP header this can no longer serve the same purpose. Note also that precedence rules mean that any declaration in the html tag overrides that in the HTTP header.

> What is/was the purpose of allowing a list of languages to be specified in the HTTP header?

In addition, there are potential issues surrounding the maintenance and use of server-side information. Many authors may find it difficult to access server settings, particularly when dealing with an ISP. So this, unlike the html attribute approach, is not a solution that is always available.

It is for these reasons, and because the use of the html attributes is simple, standard and currently implemented in browsers, that the html attribute approach is recommended.. 

> I still don't really understand why we don't say 'do it if you can' *  and/or 'check what is being sent' (if you can't change, you can at least know & know what if any impact) and/or 'these are the sorts of questions for your ISP or webmaster/sys admin'.

* CJKV (Ken Lunde, O'Reilly, Oct 2002) says "Instead of arguing which method is best, why not simply support both methods whenever possible?"


By the way


Note that this discussion is very different from that about the use of the meta charset declaration and HTTP charset header. There is no alternative markup construct in XHTML for declaring the character encoding of a document, and the HTTP header takes precedence over the meta declaration.

> I think the reasons why this is a very different discussion, could be usefully made clearer.


Background


> These examples are useful, because I think many people may find it difficult to talk abstractly about where language information is specified, but they can recognize the examples.

In current practise one can find XHTML documents that provide information about the language of a page in a number of different ways. 

One method is to use the lang and xml:lang attributes on the html tag.

Example:
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">

Alternatively, you may find documents that provide this information using a meta element.

Example:
<meta http-equiv="Content-Language" content="en"/>

Language information may also be found in the HTTP header that is sent with a document (see the last line in the following example of an HTTP header).

> Could a useful FAQ be: How do I view an HTTP header and what information does it contain about language? 

Example:
HTTP/1.1 200 OK
Date: Wed, 05 Nov 2003 10:46:04 GMT
Server: Apache/1.3.28 (Unix) PHP/4.2.3
Content-Location: CSS2-REC.en.html
Vary: negotiate,accept-language,accept-charset
TCN: choice
P3P: policyref=http://www.w3.org/2001/05/P3P/p3p.xml

Cache-Control: max-age=21600
Expires: Wed, 05 Nov 2003 16:46:04 GMT
Last-Modified: Tue, 12 May 1998 22:18:49 GMT
ETag: "3558cac9;36f99e2b"
Accept-Ranges: bytes
Content-Length: 10734
Connection: close
Content-Type: text/html; charset=iso-8859-1
Content-Language: en

It is also worth noting that the meta element and the HTTP header support a list of values. The example below declares the primary languages of the document to be (in equal measure) German, French and Italian.

Example:
<meta http-equiv="Content-Language" content="de, fr, it"/>

The question is, which of these methods is the best approach? [Jump to the answer]


Further reading


* Hints & Tips: Character Encodings http://www.w3.org/International/O-charset.html


* Tutorial: Character sets & encodings in XHTML, HTML and CSS http://www.w3.org/International/tutorials/tutorial-char-enc.html


* FAQ: Checking the character encoding using the validator http://www.w3.org/International/questions/qa-validator-charset-check.html


* FAQ: Setting language preferences in a browser http://www.w3.org/International/questions/qa-lang-priorities.html


* Unicode Enabled Products <http://www.unicode.org/onlinedat/products.html>  http://www.unicode.org/onlinedat/products.html


* Encoding Forms <http://www.unicode.org/standard/principles.html#Encoding_Forms>  http://www.unicode.org/standard/principles.html#Encoding_Forms


 -----Original Message----- 
 From: public-i18n-geo-request@w3.org on behalf of Richard Ishida 
 Sent: Tue 08/06/2004 19:40 
 To: GEO 
 Cc: 
 Subject: Meeting preread: Language declarations
 
 


 Folks,
 
 I have put together a first draft of a new FAQ that we should discuss in our meeting tomorrow.
 
 http://www.w3.org/International/questions/qa-http-and-lang.html

 
 The questions is "Should I use HTTP Content-Language or a meta element to declare the language of my XHTML document?" 
 
 Following on and arising from those thoughts I have worked on a new set of techniques for the language WD.  See below.  Please send your comments, and if we have time, lets also discuss this tomorrow. These ideas break some new ground, and will also need to be reviewed by Core and others.
 
 
 
 CURRENT VERSION OF OUTLINE
 
 Specifying the overall language of a document
 •       Always declare the language of the page as a whole in the html tag.
	•       For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only.
 •       Do not use the meta tag to declare the language of a document.
 •       Do not declare the language of a document in the body tag.
 
 Identifying in-document language changes
 •       Use the lang and/or xml:lang attributes around text to indicate any changes in language.
 •       For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only.
 
 
 
 NEW VERSION OF OUTLINE
 
 Declaring the language of content
 •       Unless there are more than one primary languages, always declare the primary language of the page in the html tag.
 •       If there are more than one primary languages, try to keep content in each primary language in a separate block, and always declare the language in the tags that define those blocks.
 •       For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only.
 •       Do not use a meta tag or HTTP header to declare the language of content.
 •       Do not declare the language of the page in the body tag.
 •       Use the lang and/or xml:lang attributes around text to indicate any changes in language.
 
 Identifying the language of the audience
 •       Consider using a meta tag or HTTP Content-Language header to indicate the language of the intended audience for a page.
 •       If there are more than one primary languages, use a meta tag or HTTP Content-Language header to list the languages of the intended audience.
 
 
 Other stuff to express in the detail:
 What's the difference between audience and content declaration, and why use different approaches.
 Acknowledge that there's no good solution for the <title> element or other stuff in <head> for documents with multiple primary languages.
 
 
 RI
 
 
 ============
 Richard Ishida
 W3C
 
 contact info:
 http://www.w3.org/People/Ishida/

 
 W3C Internationalization:
 http://www.w3.org/International/
 
 
 


http://www.bbc.co.uk/ - World Wide Wonderland

This e-mail (and any attachments) is confidential and may contain
personal views which are not the views of the BBC unless specifically
stated.
If you have received it in error, please delete it from your system. 
Do not use, copy or disclose the information in any way nor act in
reliance on it and notify the sender immediately. Please note that the
BBC monitors e-mails sent or received. 
Further communication will signify your consent to this.

Received on Wednesday, 9 June 2004 07:28:25 UTC