xml:lang and safe sex from Patrick Stickler on 2002-03-04 (w3c-rdfcore-wg@w3.org from March 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Mon, 04 Mar 2002 13:35:18 +0200
To: RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <B8A92896.FEA7%patrick.stickler@nokia.com>
I fully agree with Jeremy's comments about xml:lang and RDF literals in

http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Mar/0017.html

and that all WG decisions to date reflected therein should be upheld.


That said...   ;-)

Unfortunately, there are certain aspects of xml:lang usage in the
context of RDF and similar markup languages such as MathML which
can be considered non-intuitive at best and misleading and incorrect
at worst.

Traditionally, XML was intended for, and has been used primarily
with linguistic content. The behavior of xml:lang inheritance seems
intuitively correct in such a context. However, in cases of markup
languages which entirely or predominantly describe non-linguistic
content, such as MathML and most RDF applications, the presently
defined, all encompassing nature of xml:lang does not seem to work
in an optimal manner.

As Misha so colorfully put it, "xml:lang is like the HIV virus",
when it is inserted into an element, it infects every part of the
instance within its scope. What we need for applications such as
MathML and RDF is a kind of "condom" to prevent infection.

One possible such condom would be the standardized interpretation
of xml:lang="" as turning off any superordinate language qualification.

Although many folks (myself included) seem to consider that such an
interpretation is "obvious", it actually doesn't appear to have any
official standing, and an errata or note of some sort may be needed to
standardize it. 

Given such a standardized method of blocking inheritance of language
qualification from superordinate element scope, markup languages
which entirely or predominantly describe non-linguistic content
may either explicitly or implicitly define a default value of ""
for the xml:lang attribute for all or most of their
elements; in which case, xml:lang would only apply to the minimal
rather than maximal scope, and an element with such a default
would neither be "infected" by its parent element scope nor pass on
any such "infection" to its sub-elements.

Of course, such a treatment does not prohibit one to use xml:lang
wherever it is relevant, and use of xml:lang with clearly linguistic
properties such as rdfs:label, rdfs:comment, etc. is to be strongly
encouraged; but such language qualifications must be explicitly
specified for each case.

In the case of RDF/XML, all elements would be considered to have
an implicitly defined  default value of "" for xml:lang so that
language inheritance is limited to the minimal scope as defined
above.

This means that when folks write

<rdf:RDF ... xml:lang="en">
<rdf:Description rdf:about="#Bob">
    <age>35</age>
</rdf:Description>
<rdf:Description rdf:about="#age">
   <rdfs:label>Age</rdfs:label>
   <rdfs:label xml:lang="fi">Ikä</rdfs:label>
</rdf:Description>
</rdf:RDF>

we get

Bob age "35" .
age rdfs:label "Age" .
age rdfs:label "Ikä"-fi .

and not

Bob age "35"-en .
age rdfs:label "Age"-en" .
age rdfs:label "Ikä"-fi .

The above RDF/XML would actually be considered to be equivalent to
the more explicit

<rdf:RDF ... xml:lang="en">
<rdf:Description rdf:about="#Bob" xml:lang="">
    <age xml:lang="">35</age>
</rdf:Description>
<rdf:Description rdf:about="#age" xml:lang="">
   <rdfs:label xml:lang="">Age</rdfs:label>
   <rdfs:label xml:lang="fi">Ikä</rdfs:label>
</rdf:Description>
</rdf:RDF>

So, folks *can* still say in RDF that "35" is English if they really,
really want to ;-) but the default treatment would be that RDF literals
are not qualified for language unless explicitly specified on a literal
by literal basis.

And if language is specified, then we expect literal equality matching
to take that into account, as defined by Jeremy's proposed matching
algorithm, but language qualified literals of inherently non-linguistic
content will be the rare exception rather than the norm.

Eh?

PS: An alternative to xml:lang="" is Jeremy's proposal to consider
xml:lang="*" based on RFC 3066 as the default for literals unspecified
for language. And it seems like that would accomplish the same thing.
Which is chosen may just boil down to a matter of taste, as to whether
"no language" or "all languages" seems more intuitively correct to
say about e.g. a numeral.

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Monday, 4 March 2002 06:33:30 UTC