- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Mon, 04 Mar 2002 13:35:18 +0200
- To: RDF Core <w3c-rdfcore-wg@w3.org>
I fully agree with Jeremy's comments about xml:lang and RDF literals in http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Mar/0017.html and that all WG decisions to date reflected therein should be upheld. That said... ;-) Unfortunately, there are certain aspects of xml:lang usage in the context of RDF and similar markup languages such as MathML which can be considered non-intuitive at best and misleading and incorrect at worst. Traditionally, XML was intended for, and has been used primarily with linguistic content. The behavior of xml:lang inheritance seems intuitively correct in such a context. However, in cases of markup languages which entirely or predominantly describe non-linguistic content, such as MathML and most RDF applications, the presently defined, all encompassing nature of xml:lang does not seem to work in an optimal manner. As Misha so colorfully put it, "xml:lang is like the HIV virus", when it is inserted into an element, it infects every part of the instance within its scope. What we need for applications such as MathML and RDF is a kind of "condom" to prevent infection. One possible such condom would be the standardized interpretation of xml:lang="" as turning off any superordinate language qualification. Although many folks (myself included) seem to consider that such an interpretation is "obvious", it actually doesn't appear to have any official standing, and an errata or note of some sort may be needed to standardize it. Given such a standardized method of blocking inheritance of language qualification from superordinate element scope, markup languages which entirely or predominantly describe non-linguistic content may either explicitly or implicitly define a default value of "" for the xml:lang attribute for all or most of their elements; in which case, xml:lang would only apply to the minimal rather than maximal scope, and an element with such a default would neither be "infected" by its parent element scope nor pass on any such "infection" to its sub-elements. Of course, such a treatment does not prohibit one to use xml:lang wherever it is relevant, and use of xml:lang with clearly linguistic properties such as rdfs:label, rdfs:comment, etc. is to be strongly encouraged; but such language qualifications must be explicitly specified for each case. In the case of RDF/XML, all elements would be considered to have an implicitly defined default value of "" for xml:lang so that language inheritance is limited to the minimal scope as defined above. This means that when folks write <rdf:RDF ... xml:lang="en"> <rdf:Description rdf:about="#Bob"> <age>35</age> </rdf:Description> <rdf:Description rdf:about="#age"> <rdfs:label>Age</rdfs:label> <rdfs:label xml:lang="fi">Ikä</rdfs:label> </rdf:Description> </rdf:RDF> we get Bob age "35" . age rdfs:label "Age" . age rdfs:label "Ikä"-fi . and not Bob age "35"-en . age rdfs:label "Age"-en" . age rdfs:label "Ikä"-fi . The above RDF/XML would actually be considered to be equivalent to the more explicit <rdf:RDF ... xml:lang="en"> <rdf:Description rdf:about="#Bob" xml:lang=""> <age xml:lang="">35</age> </rdf:Description> <rdf:Description rdf:about="#age" xml:lang=""> <rdfs:label xml:lang="">Age</rdfs:label> <rdfs:label xml:lang="fi">Ikä</rdfs:label> </rdf:Description> </rdf:RDF> So, folks *can* still say in RDF that "35" is English if they really, really want to ;-) but the default treatment would be that RDF literals are not qualified for language unless explicitly specified on a literal by literal basis. And if language is specified, then we expect literal equality matching to take that into account, as defined by Jeremy's proposed matching algorithm, but language qualified literals of inherently non-linguistic content will be the rare exception rather than the norm. Eh? PS: An alternative to xml:lang="" is Jeremy's proposal to consider xml:lang="*" based on RFC 3066 as the default for literals unspecified for language. And it seems like that would accomplish the same thing. Which is chosen may just boil down to a matter of taste, as to whether "no language" or "all languages" seems more intuitively correct to say about e.g. a numeral. Patrick -- Patrick Stickler Phone: +358 50 483 9453 Senior Research Scientist Fax: +358 7180 35409 Nokia Research Center Email: patrick.stickler@nokia.com
Received on Monday, 4 March 2002 06:33:30 UTC