Encoding language information

Hi,

  First, unlike HTML, XML 1.0 Second Edition does not specify whether
xml:lang inherits the language specification from higher-level protocol
information. HTML 4.01 reads in [1]:

[...]
An element inherits language code information according to the following
order of precedence (highest to lowest):

 * The lang attribute set for the element itself. 
 * The closest parent element that has the lang attribute set (i.e., the
   lang attribute is inherited). 
 * The HTTP "Content-Language" header (which may be configured in a
   server). For example: 

     Content-Language: en-cockney

  * User agent default values and user preferences.
[...]

Let's ignore the last item, but why this discrepancy? It's causing an
odd situation especially for XHTML documents. The user agents I tested
inherited from the HTTP header, though. I'd like to propose to change
this and require implementations to take higher-level protocol
information into consideration.

Second, I wonder why the ISO-639-2 language code "UND" (Undetermined)
has been considered inappropriate for xml:lang, and errata item E41 thus
introduces a new means to specify essentially the same thing using an
empty xml:lang="" attribute specification. Is xml:lang="" equivalent to
xml:lang="und"? Could this information please be added to the errata?

[1] http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1.2
[2] http://www.w3.org/XML/xml-V10-2e-errata#E41

regards.

Received on Wednesday, 12 March 2003 19:16:12 UTC