W3C home > Mailing lists > Public > xml-editor@w3.org > January to March 2003

Encoding language information

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 13 Mar 2003 01:17:05 +0100
To: xml-editor@w3.org
Message-ID: <3e7ec953.306860171@smtp.bjoern.hoehrmann.de>


  First, unlike HTML, XML 1.0 Second Edition does not specify whether
xml:lang inherits the language specification from higher-level protocol
information. HTML 4.01 reads in [1]:

An element inherits language code information according to the following
order of precedence (highest to lowest):

 * The lang attribute set for the element itself. 
 * The closest parent element that has the lang attribute set (i.e., the
   lang attribute is inherited). 
 * The HTTP "Content-Language" header (which may be configured in a
   server). For example: 

     Content-Language: en-cockney

  * User agent default values and user preferences.

Let's ignore the last item, but why this discrepancy? It's causing an
odd situation especially for XHTML documents. The user agents I tested
inherited from the HTTP header, though. I'd like to propose to change
this and require implementations to take higher-level protocol
information into consideration.

Second, I wonder why the ISO-639-2 language code "UND" (Undetermined)
has been considered inappropriate for xml:lang, and errata item E41 thus
introduces a new means to specify essentially the same thing using an
empty xml:lang="" attribute specification. Is xml:lang="" equivalent to
xml:lang="und"? Could this information please be added to the errata?

[1] http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1.2
[2] http://www.w3.org/XML/xml-V10-2e-errata#E41

Received on Wednesday, 12 March 2003 19:16:12 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:37:42 UTC