Specifying the primary language in HTML

Hello everyone,

I took an action item to research the various ways to specify the primary
language in HTML. As far as I could determine, specifying the natural
language of an HTML-document has only been possible since HTML 4. 

(Side note: This may have serious repercussions for the achievability of
level 1 because it means you will not be able to comply with WCAG with
versions of HTML prior to HTML 4. This is something that I think should be
discussed in the group at large.)

There turn out to be several ways to specify the natural language in HTML
4+. Some of the resources contradict each other as to what the preferred
method is. That's why I first want to present the problems and my views to
see if you agree with that before I write the techniques.

I have found three ways to specify the primary language in HTML:

LANG-ATTRIBUTE OF THE HTML ELEMENT
Use the lang-attribute of the HTML element.
Example: <html lang="nl">
This has been around since HTML 4. [1]

XML:LANG-ATTRIBUTE OF THE HTML ELEMENT
Use the xml:lang-attribute of the HTML element when specifying the language
of an element.
Example: <html xml:lang="nl" when specifying the language of an element>
The xml:lang attribute was introduced in XHTML 1.0 and is only valid for
XHTML 1+. The xml:lang-attribute is to be used together with the
lang-attribute to make sure the XHTML documents render on older HTML user
agents. [2]

META-ELEMENT USING HTTP-EQUIP AND CONTENT ATTRIBUTES
Use the meta-element to specify the language using the http-equiv and the
content attribute. 
Example: <META HTTP-EQUIV="Content-Language" Content="fr, en">. 
This causes the language(s) to be transmitted in the Content-Language field
of the HTTP header. The order of the languages is important here, the first
language is the language of the base content.  
A note that was written to clarify the HTML 4 specification recommends using
the meta element to specify the language of the document as a whole, in
preference to the lang-attribute. The advantages are that the language(s)
will be sent in the HTTP-header and the fact that you can specify multiple
languages. [2]  

===============================

My personal approach would be:

If you have a document with multiple primary languages (for example: a
Canadian document which is half in English, half in French), use the <META
HTTP-EQUIV="Content-Language" Content="fr, en"> technique together with the
HTML technique for identifying changes in language to denote the language of
the various sections. 

If you have a document with just 1 primary language, and you use HTML 4, use
<html lang="nl">. 

If you have a document with just 1 primary lanuguage and you use XHTML 1+,
use <html lang="nl" xml:lang="nl">

If you agree with this approach, I can write these three HTML techniques and
clarify within the techniques when to use which. I'm only able to work on
this up until Wednesday so I would appreciate any quick feedback.

Yvette Hoitink
Heritas, Enschede, the Netherlands
E-mail: y.p.hoitink@heritas.nl
WWW: http://www.heritas.nl 

[1] http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.1
[2] http://www.w3.org/TR/xhtml1/#C_7
[3] http://www.w3.org/TR/1998/NOTE-html-lan-19980313

Received on Monday, 12 September 2005 12:18:55 UTC