difference in semantics of lang attribute between XML and HTML

While reviewing the WS i18n usage scinario document, I have
noticed that there is a semantic difference between XML and HTML
language specs.

In section 8.1 of HTML 4.01 spec at
http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1
there is an example that reads:
	<P><Q lang="en">Her super-powers were the result of
	&gamma;-radiation,</Q> he explained.</P>

and:
<HTML lang="fr">
<BODY>
...Interpreted as French...
<P lang="es">...Interpreted as Spanish...
<P>...Interpreted as French again...
<P>...French text interrupted by<EM lang="ja">some
         Japanese</EM>French begins here again...
</BODY>

From these examples and explanaition, the lang
attribute just aids the client system to do a better job by
giving it a "hint".  The various elements with different
lang attributes are *not* mutually exclusive.  All of them are
rendered.  This HTML fragment:

	<p lang="en-GB">What colour is it?</p>
	<p lang="en-US">What color is it?</p>

will result in both sentences to appear on screen.

But in XML, the lang attribute provides a mutually
exclusive, choice semantics, according to the description
and examples given in Section 2.12 of XML 1.0:
http://www.w3.org/TR/REC-xml.html#sec-lang-tag
The screen would be empty if the above fragment
(substitute lang with xml:lang) is interpreted by XML
parser running in non English locale (or even in
en-CA locale!).

I wonder this deviation in semantics is a conscious 
decision made for XML. 

I also wonder if there should be a mechanism to
provide a fall back lang tag so that at least one
element is selected as a fall back when none of the 
alternative elements is chosen because of the
mismatch with lang attribute.  Has this been
discussed?

----
T. "Kuro" Kurosaka, Internationalization Architect
IONA Technologies, Santa Clara, CA USA / +1 408 350-9684 

Received on Thursday, 2 January 2003 15:44:33 UTC