W3C home > Mailing lists > Public > www-international@w3.org > January to March 2007

Re: How do I say ‘this is not in any language’ in XHTML/HTML

From: Dan Brickley <danbri@danbri.org>
Date: Tue, 13 Mar 2007 19:58:15 +0000
Message-ID: <45F70257.70709@danbri.org>
To: Richard Ishida <ishida@w3.org>
Cc: www-international@w3.org

Richard Ishida wrote:
> This is an attempt to summarise and move forward some ideas in a thread on www-international@w3.org by Christophe Strobbe, Martin Duerst, Bjoern Hoermann and Tex Texin.
> http://lists.w3.org/Archives/Public/www-international/2005JulSep/0163.html
> 
> 
> 
> You should always use the lang and/or xml:lang attributes in HTML or XHTML to identify the human language of the content so that applications such as voice browsers, style sheets, and the like can process that text. (See Declaring Language in XHTML and HTML[1] for the details.)
> 
> You can override that language setting for a part of the document that is in a different language, eg. some French quotation in an English document, by using the same attribute(s) around the relevant bit of text.
> 
> Suppose you have some text that is not in any language, such as type samples, part numbers, perhaps program code. How would you say that this was no language in particular?
> 
> There are a number of possible approaches:
> 
>    1. A few years ago we introduced into the XML spec the idea that xml:lang=”" conveys that ‘there is no language information available’. (See 2.12 Language Identification[2])
> 
>    2. An alternative is to use the value ‘und’, for ‘undetermined’.
> 
>    3. In the IANA Subtag Registry[3] there is another tag, ‘zxx’, that means ‘No linguistic content’. Perhaps this is a better choice. It has my vote at the moment.

(3) gets my vote too.

The other options indicate absence of information about language, it 
seems. (2) suggests some effort has been made to determine it and 
failed, (1) seems silent on reasons for its unavailability.

Imagine a tool that did its heuristic best (eg. taking info about the 
author's preferred language, natural language parsers, source documents 
etc) to figure out what language some HTML was written in.

It would make sense to invoke such a tool triggered on xml:lang="", 
wouldn't it? But if we go around encouraging folk to use that for ‘this 
is not a language’, that scenario gets rather broken. Similarly with 'und'.

So... how should zxx be pronounced? :)

Dan
Received on Tuesday, 13 March 2007 19:58:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:09 GMT