W3C home > Mailing lists > Public > www-international@w3.org > October to December 2004

RE: declaring language in html/xhtml

From: Jon Hanna <jon@hackcraft.net>
Date: Sat, 11 Dec 2004 11:01:23 -0000
To: "'Alan Pierce'" <apierce411@hotmail.com>, <www-international@w3.org>
Message-Id: <20041211110138.C558B2DAC5580@postie.hosting365.ie>

> Does it make any practical difference to serve html with the html tag 
> marked-up as xhtml, like:
> <html lang="ja-JP" xml:lang="ja_JP"  
> xmlns="http://www.w3.org/1999/xhtml">
> as opposed to simply
> <html lang="ja-JP"> ?

There's a few things here.

1. ja-JP means the dialect of Japanese spoken in Japan as opposed to the 1
or more dialects spoken elsewhere. I've been told that there isn't any other
country with a different form of Japanese, so the correct language tag is
just "ja" unlike, for example British English "en-GB" which does benefit
from the second part of the tag as it differentiates it from en-IE, en-US
etc. (I don't know much about Japanese, but I've seen ja-JP used as an
example of just this sort of mistake by those who do know more than I).

2. ja_JP is incorrect syntax, both lang and xml:lang take RFC 3066 tags so
there are no underscores (a typo?).

3. The lang attribute is only in XHTML for backwards compatibility, so that
when an old HTML tool that doesn't grok XHTML sees the XHTML it will act as
if it is HTML and be able to determine the language. Contra this
general-purpose XML tools that don't know anything specific about XHTML (and
the ability to use such tools is the main practical advantage in using XHTML
rather than HTML) will understand the xml:lang, but not the lang. As such
xml:lang is the one that you must use, lang is the one that you can use as

<html lang="ja">
<!-- HTML 4.01 or earlier, Japanese -->

<blah xml:lang="ja">
<!-- Some form of XML, Japanese -->

<html xml:lang="ja">
<!-- Some form of XML, Japanese (Not XHTML, as there's no namespace) -->

<html xml:lang="ja" xmlns="http://www.w3.org/1999/xhtml">
<!-- XHTML, Japanese -->

<html lang="ja" xmlns="http://www.w3.org/1999/xhtml">
<!-- XHTML, Japanese, but general XML tools won't realise this. -->

<html xml:lang="ja" lang="ja" xmlns="http://www.w3.org/1999/xhtml">
<!-- XHTML, Japanese, backwards compatible with old HTML user-agents -->

<html xml:lang="ja" lang="en" xmlns="http://www.w3.org/1999/xhtml">
<!-- Obviously a bug, but the way it's interpreted is worth looking at.
An XML tool will see it as Japanese.
An HTML tool will see it as English.
An XHTML tool will see xml:lang as over-riding lang, since lang is just for
backwards-compatibility, and hence see it as being Japanese -->

In all I'd recommend you keep using the fuller form until the general level
of tool support means you can drop lang and just use xml:lang.

Jon Hanna
Work: <http://www.selkieweb.com/>
Play: <http://www.hackcraft.net/>
Chat: <irc://irc.freenode.net/selkie> 
Received on Saturday, 11 December 2004 11:01:50 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:24 UTC