xml:lang semantics

I'd like to clear up some possible misunderstandings with respect to
xml:lang.

 

1.	xml:lang is intended solely to allow the author to specify the
(natural) language of some given #PCDATA content, including the "empty"
(i.e., not specified) language expressed as xml:lang="";
2.	one of the important aspects of xml:lang in this regard is its
use in performing line layout, including line layout, which is based on
XSL 1.0 semantics, and which takes into account the language of the
content as expressed by xml:lang to perform certain implied font
mappings (e.g., selecting between Chinese, Japanese, and Korean
renditions of a unified ideographic character, selecting between
different joining treatments of Arabic letters depending on language and
font style, etc.);
3.	xml:lang is NOT explicitly intended to be used by authors to
express the intended consumer of such content; this may be an
application of a downstream consumer of the content, but such usage is
in that application's semantic domain, and not the domain of the DFXP
document instance;
4.	there is no constraint implied by DFXP, nor should there be such
a constraint, on how a DFXP compliant transformation processor makes use
of information in a DFXP document instance, including selection of
content based on the values of xml:lang attributes; in any case, the
type of transformation performed and its criteria for transformation are
in the DFXP application domain as such, and not in the DFXP information
set domain; [and here, by 'application domain', I mean whatever the
author of the transformation processor desires and expects to achieve];
5.	there is NO selection or filtering semantics specified or
implied by DFXP that uses xml:lang as a criterion, and, notwithstanding
my previous email suggesting a possibility of adding such an extension,
I am now inclined to agree with those who think that such an extension
should not be defined by DFXP;
6.	xml:lang was designed to be able to apply to an element, its
attributes, and its content (including child elements), where children
can override an ancestor's indication of language;
7.	DFXP presently supports the express use of xml:lang on all
element types, even on those element types that do not immediately
appear to contain content as such;
8.	DFXP requires the tt element to specify an xml:lang attribute,
even if its value is the empty string; the value of xml:lang on the tt
element is referred to as the "document language", and effectively
serves as a default for all content in the document (in the absence of
any other xml:lang specification);
9.	there are a variety of mechanisms that can be used by authors to
explicitly express their intent in a conformant DFXP document instance
to downstream processors, including the use of non-TT elements and
attributes, the use of non-TT metadata, and the use of the new
requiredExtensions attribute expected to be added to DFXP;

 

 

I believe that DFXP does not need any normative change with respect to
xml:lang; however, I do not object to adding informative content to the
spec that reminds readers of some or all of the above.

 

G.

 

 

 

Received on Friday, 5 December 2008 03:44:56 UTC